DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-4, 6, 8-14, 16-21 are rejected under 35 U.S.C. 103 as being unpatentable over Archer et al. (US 2019/0045003) (hereafter Archer) in view of Mclaren et al. (US 2018/0240039) (hereafter Mclaren).
 	Regarding claim 1, Archer discloses computer (figure 5, system 200) comprising a plurality of interconnected processing nodes arranged in a configuration of multiple stacked layers of processing nodes forming a multi-face Prism figure 5: paragraph [0055], "System 200 can include nodes 1026-102! arranged in a hypercube network topology.”; a hypercube topology comprises stacked layers of nodes forming a multi-face prism); wherein each face of the prism comprises multiple stacked pairs of processing nodes (Figure 5; each lace comprises two stacked pairs of nodes}, wherein the processing nodes of each pair are connected by  at least two intralayer links, and the processing node of each pair is connected to a corresponding processing node in an adjacent pair by at least one interlayer link (paragraph [OO17I, "nodes 102a-102d can communicate on a two way or bi-directional chain network (6.¢., an edge disjointed ring} wherein the corresponding processing nodes are connected by respective interlayer links to form respective edges of the configuration (figure 5; each node is connected through links forming respective edges): and wherein each pair of processing nodes forms part of one of the layers of the configuration (figure 5, nodes 102 f, g and 102 |, k form part of one of the layers}, each layer comprising multiple processing nodes (Figure 5, nodes 102 f, 9, j, k}), each processing node connected to their neighboring processing nodes in the layer by at least one of the intralayer links to form a ring (Figure 5, ring between nodes 102 f, q, i, K); wherein the multiple stacked layers comprise first and second endmost layers (figure 5; hypercube comprises first and second endmost layers), processing unit then transmits the combined gradient vector to the next processing unit on the direction (425). Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to combine the teachings of Mclaren with the Archer, as a whole, so as to incorporate the intermediate processing nodes to perform the processing between the previous and end nodes. 

 	Regarding claim 2, the combined teachings do not explicitly disclose the computer wherein the multi-face prism has three processing nodes in each layer, providing three respective faces for the first portion of respective one-dimensional paths. However, using three processing nodes instead of four is considered a mere implementation choice that the person skilled in the art would apply according to circumstances without exercising any inventive step. It is also noted that according to figures 5 and 6 of the application, a multi-face prism comprising three processing nodes in each layer is equivalent to a 4x3 torus topology, which is also suggested by Archer. (paragraph [0029], “Multiple bi-directional chain networks or edge disjointed rings can be formed in many network topologies including n-dimensional torus} {KSR: - Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention}.
 	Regarding claim 3, the combined teachings further discloses the computer, wherein in the at least one intermediate layer each processing node is connected to its neighboring processing node by two interlayer links (see, Mclaren teaches parallel processing of reduction and broadcast operation on large datasets of non-scalar data in which the [0049] FIG. 4A is a flow diagram that illustrates the process performed by each intermediate processing unit. Each intermediate processing unit combines its gradient vector or a portion of its gradient vector with the input gradient vector it received upon receipt of portion of an input gradient vector from a previous processing unit (410). The combining operation performed by each processing unit can be a simple sum or some other computation that combines the gradient vectors. The intermediate processing unit then transmits the combined gradient vector to the next processing unit on the direction (425)) {KSR: - Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention}.

 	Regarding claim 3, the combined teachings further discloses  wherein in the at least one intermediate layer each processing node is connected to its neighboring processing node by two interlayer links (connecting each processing in the intermediate layer to its neighboring processing node by two interlayer links and connecting each processing node in the first and second end most layers to its neighboring processing node by three interlayer links to enable the simultaneous transmission of data on three one dimensional paths in the configuration is a mere extension of the 2x2x2 mesh topology disclosed in Archer (figure 5} to a torus topology, which is already disclosed in Archer (paragraph [0029], “Multiple bi-layer is equivalent to a 4x3 torus topology, which is also hinted by Archer (paragraph [0029], “Multiple bi-directional chain networks or edge disjointed rings can be formed in many network topologies including n-dimensional torus, dragonfly (for example, with multiple network cards per node), etc. and network-contentions are not present in multiple bi-direction chain networks or edge disjointed edges). {KSR: - Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention}.
 	Regarding claim 4, Archer further discloses the computer, wherein in the first and second endmost layers each processing node is connected to its neighboring processing node by three interlayer links to enable the simultaneous transmission of data on three one dimensional paths in the configuration (mere extension of the 2x2x2 mesh topology disclosed in Archer (figure 5} to a torus topology, which is already disclosed in Archer (paragraph [0029], “Multiple bi-layer is equivalent to a 4x3 torus topology, which is also hinted by Archer (paragraph [0029], “Multiple bi-directional chain networks or edge disjointed rings can be formed in many network topologies including n-dimensional torus}.{KSR: - Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention}.

 	Regarding claim 6, Archer further discloses the computer, wherein each of the processing nodes is programmed to identify one of their interlayers and intralayer links to transmit data in order to determine the one-dimensional path for that data. (paragraph [0026], "system 100 can be configured to form two rings of all the nodes to be used in the collective communication operation and concurrently perform pipelined parallel prefix operations in opposite directions along each of the two rings").
 	Regarding claims 8 and 16, Archer further discloses the computer according, wherein each processing node is programmed to divide a respective partial vector of that node into fragments and to transmit the data in the form of successive fragments around each one-dimensional path ( partial results in figures 4A-4D; paragraph [0056], "more than one bi-directional chain network or edge disjointed rings can be formed and the input data for the collective communication operation can be divided equally amount each pair of rings and the all reduce process can be executed independently on the divided data”).
 	Regarding claims 9 and 21, Archer further discloses the computer according to claim 8, which is programmed to operate each path as a set of logical rings, wherein the successive fragments are transmitted around each logical ring in simultaneous transmission steps. (paragraph [0056], “more than one bi-directional chain network or edge disjointed rings can be formed and the input data for the collective communication operation can be divided equally amount each pair of rings and the all reduce process can be executed independently on the divided data").
	Regarding claims 10 and 17, Archer further discloses the computer according to claim 8, wherein each processing node is configured to output a respective fragment on each of two links simultaneously (00481, “communicated data in first direction 116a-1, communicated data in second direction 116a-2").

 	Regarding claim 11, Archer further discloses the computer according to claims 8, wherein each processing node is configured to reduce two incoming fragments with two respective corresponding locally stored fragments. (paragraph [0018], “At each node, the corresponding results from two prefix reductions (a prefix reduction from one direction and a second prefix reduction from the other direction) is reduced to give an expected all reduce result on each node"; figures 44-4D: paragraph [0023], “All reduce collective operations are typically used in many machines learning and high-performance computing (HPC) applications. Current solutions used to perform an all reduce operation are algorithms such as a tree based reduce followed by a broadcast, recursive exchange, Rabensifner's algorithm (a reduce scatter followed by an all gather) and rings").
 	Regarding claim 12, Archer further discloses the computer according to claim 11, wherein each processing node is configured to transmit fully reduced fragments on each of two links simultaneously in an All gather phase of an All reduce collective (paragraph [0018], “At each node, the corresponding results from two prefix reductions (a prefix reduction from one direction and a second prefix reduction from the other direction) is reduced to give an expected all reduce result on each node"; figures 44-4D: paragraph [0023], “All reduce collective operations are
typically used in many machines learning and high-performance computing (HPC) applications. Current solutions used to perform an all reduce operation are algorithms such as a tree based reduce followed by a broadcast, recursive exchange, Rabensifner's algorithm (a reduce scatter followed by an all gather) and rings").
 	Regarding claim 13, Archer further disclose the computer according to claim 1, where each link is bi-directional. (paragraph [0018], “system 100 can be configured to perform pipelined parallel prefix operations in opposite directions along a bi-directional communication path").

	Regarding claims 14 and 20, Archer discloses a method of generating a set of programs to be executed in parallel on a computer comprising a plurality of processing nodes connected in a configuration comprising a multi-face prism (see, Fig. 5, paragraph [0027], parallel computing);
 wherein each face of the prism comprises multiple stacked pairs of processing nodes, wherein the processing nodes of each pair are connected to each other by at least two interlayer links, and the processing node of each pair is connected to a corresponding processing node in an adjacent pair by at least one interlayer link wherein the corresponding processing nodes are connected by respective interlayer links to form respective edges of the configuration(Figure 5; each lace comprises two stacked pairs of nodes}, wherein the processing nodes of each pair are connected by  at least two intralayer links, and the processing node of each pair is connected to a corresponding processing node in an adjacent pair by at least one interlayer link (paragraph [OO17I, "nodes 102a-102d can communicate on a two way or bi-directional chain network (i.e., an edge disjointed ring} wherein the corresponding processing nodes are connected by respective interlayer links to form respective edges of the configuration (figure 5; each node is connected through links forming respective edges); and  wherein each pair of processing nodes forms part of one of the layers of the configuration, each layer comprising multiple processing nodes, each processing node connected to their neighboring processing nodes in the layer by at least one of the intralayer links to form a ring (figure 5, nodes 102 f, g and 102 |, k form part of one of the layers}, each layer comprising multiple processing nodes (Figure 5, nodes 102 f, 9, j, k}), each processing node connected to their neighboring processing nodes in the layer by at least one of the intralayer links to form a ring (Figure 5, ring between nodes 102 f, q, i, K);  wherein the multiple stacked layers comprise first and second endmost layers, and at least one intermediate layer, (figure 5; hypercube comprises first and second endmost layers), and at least one intermediate layer); the method comprising: generating at least one data transmission instruction for each program to define a data transmission stage in which data is transmitted from the processing node executing that program, wherein the data transmission instruction comprises a link identifier which defines an outgoing link on which data is to be transmitted in that data transmission stage ([0060] Turning to FIG. 8, FIG. 8 is an example flowchart illustrating possible operations of a flow 800 that may be associated with a collective communication operation, in accordance with an embodiment. In an embodiment, one or more operations of flow 800 may be performed by collective operations engine 104. At 802, a plurality of nodes that will be used in a collective communication operation are identified. At 804, a bi-directional chain network that includes the plurality of nodes is determined. At 806, a first direction and a second direction in the bi-directional chain network are identified. At 808, data for a reduction operation that is part of the collective communication operation is received at a node. At 810, the system determines if the data came from the first direction. If the data came from the first direction, then the received data for the reduction operation is stored as first direction data, as in 812. At 814, the reduction operation is performed using the node's data contribution to the reduction operation and the received first direction data. At 816, the results of the reduction operation are stored as first intermediate results data. At 818, the first intermediate results data is communicated to a first next destination and the system returns to 808 to receive data from the second direction); and determining the link identifiers in order to transmit data around each of a plurality of one-dimensional paths formed by respective sets of processing nodes and links, each one-dimensional path having a first portion between the first and second endmost layers using all processing nodes in one of the faces of the configuration only once, and a second portion provided between the second and first endmost layers and comprising one of the edges of the configuration. (paragraph [0056], “in an example, nodes 102e-102) can be organized as a bi-directional chain network where the bi-directional path is from node 102e, to node 102f, to node 102g, to node 102h, of node 102h, to node 102e, to node 102k, and to node 1021I. In another example, nodes 102e-102i can be organized as 4 bi-directional chain networks where the bi-directional path is from node 102k, to node 102f, to node 102q, to node 102h, to node 102e, to node 1Oe2l, to node 102i, and to node 102). it should be appreciated that other bi-directional chain networks can be organized. In some examples, more than one bi-directional chain network or edge disjointed rings can be formed and the input data for the collective communication operation can be divided equally amount each pair of rings and the all reduce process can be executed independently on the divided data"). But, the Archer does not explicitly disclose computer comprises at least one intermediate layer, however, its obvious as described below. However, in same field of endeavor, Mclaren teaches parallel processing of reduction and broadcast operation on large datasets of non-scalar data in which the [0049] FIG. 4A is a flow diagram that illustrates the process performed by each intermediate processing unit. Each intermediate processing unit combines its gradient vector or a portion of its gradient vector with the input gradient vector it received upon receipt of portion of an input gradient vector from a previous processing unit (410). The combining operation performed by each processing unit can be a simple sum or some other computation that combines the gradient vectors. The intermediate processing unit then transmits the combined gradient vector to the next processing unit on the direction (425). Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to combine the teachings of Mclaren with the Archer, as a whole, so as to incorporate the intermediate processing nodes to perform the processing between the previous and end nodes.
 	Regarding claim 18, Archer further discloses the method according to claim 16, wherein each program comprises one or more instruction to reduce two incoming fragments with two respective corresponding locally stored fragments. (paragraph [0018], “At each node, the corresponding results from two prefix reductions (a prefix reduction from one direction and a second prefix reduction from the other direction) is reduced to give an expected all reduce result on each node"; figures 4A-4D: paragraph [0023], “All reduce collective operations are typically used in many machines learning and high-performance computing (HPC) applications. Current solutions used to perform an all reduce operation are algorithms such as a tree based reduce followed by a broadcast, recursive exchange, Rabensifner's algorithm (a reduce scatter followed by an all gather) and rings").
 	Regarding claim 19, Archer further discloses the method according to claim 18, wherein each program comprises one or more instruction to transmit fully reduced fragments on each of two links simultaneously in an All gather phase of an All reduce collective. (paragraph [0018], “At each node, the corresponding results from two prefix reductions (a prefix reduction from one direction and a second prefix reduction from the other direction) is reduced to give an expected all reduce result on each node"; figures 44-4D: paragraph [0023], “All reduce collective operations are typically used in many machines learning and high-performance computing (HPC) applications. Current solutions used to perform an all reduce operation are algorithms such as a tree based reduce followed by a broadcast, recursive exchange, Rabensifner's algorithm (a reduce scatter followed by an all gather) and rings").

 7.	Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Archer et al. (US 2019/0045003) (hereafter Archer) in view of Mclaren et al. (US 2018/0240039) (hereafter Mclaren) and further in view of Mukhopadhyay et al. (US 2010/0158005) (hereafter Mukhopadhyay).
Regarding claim 5, Archer further discloses the computer of claim 1, which has been configured from a multi-face prism comprising a set of stacked layers (see, Fig. 5), the processing nodes of each stacked layer having an interlayer link to a corresponding processing node in an adjacent stacked layer and an interlayer link between neighboring processing nodes in the layer (see, Fig. 5, [0019] In a specific example, a node (e.g., node 102b) can be configured to receive data from a first node (e.g., node 102a) in a bi-directional chain of nodes. The data can be used to perform an operation (e.g., a reduction operation) that is part of a collective communication operation using the data from the first node and data on the node to create an intermediate result. The intermediate result can be stored in memory and communicated to a second node (e.g. node 102c). Second data can be received from the second node and the operation that is part of the collective communication operation using the second data from the second node and the data on the node can be performed to create a second intermediate result. The second intermediate result can be communicated to the first node. The operation that is part of the collective communication operation can be performed using the second data from the second node and the intermediate result to create a collective communication operation result. In an example, the collective communication operation is an all reduce operation. The chain of nodes can be an edge disjointed ring and the first node and the second node can be part of a multi-tiered topology network), 
But,  the combined teachings do not explicitly disclose disconnecting each interlayer link in a designated stacked layer and connecting it to a neighboring processing node in the designated stacked layer to provide an intralayer link whereby the designated stacked layer forms on of the first and second endmost layers. However, in same field of endeavor, Mukhopadhyay teaches in Fig. 1, paragraph [0075], [0075] In the preferred embodiment, the hop bit pairs are stored in the NoC header word from right to left to represent a maximum sequence of 24 hops (2*24=48 bits of the 64-bit NoC header word), and are thus arranged in the NoC header as hop24, hop23, . . . , hop1, hop0. From the foregoing, it will be appreciated that a message originating at a node will be sent out on a switch 14 in one of four directions. Once the message has left the node where it originated, each following node in the list of routing hops will forward the message on by sending it straight, to the right, or to the left. For example, if the hop is coded 00 (straight) and arrives on the south link, it will be sent out on the north link. If the hop is coded 01 (right) and it arrives on the west link, it will be sent out on the south link. If the hop is coded 10 (left) and it arrives on the north link, it will be sent out on the east link. The last hop in the list will always be 11 which means that the message has arrived at the destination node. Before exiting a switch element at each hop, the hops in the header are right shifted so that the hop bit pair seen by the next node will be the correct next hop. Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to incorporate the teachings of Mukhopadhya teachings of switches for selecting connecting the processing elements via links in the same layer and different layers at least to allow the system change the path of the data to be processed when its determined that a path with lowest offset or distance is available to reduce the time to transmit the data to intended destination to increase the throughput (see, paragraph [0033]).

8.	Claim(s) 7 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Archer et al. (US 2019/0045003) (hereafter Archer) in view of Mclaren et al. (US 2018/0240039) (hereafter Mclaren) and further in view of Chen Juan et al. “Reducing static energy in supercomputer interconnection networks using topology aware partitioning” (hereafter Chen) (see IDS).
 	Regarding claim 7, the combined teachings do not explicitly disclose the computer wherein each of the processing nodes is programmed to deactivate any of its interlayer and intralayer links which are unused in a data transmission step. However, in same field of endeavor, Chen teaches Deactivating unused links in order to create self-contained partitions is a well-known design choice, as seen for example in D3 (figure 4a, fully independently closable router groups, (a} Each router group in a fully independently closable router group can be switched off independently (30  percent off and 70 percent off, respectively}; section 1 introduction; “we address such a problem, i.e., one of partitioning the interconnection network to guide resource allocation decisions, so that as many unused routers as possible can be switched off, for as long as possible in order to reduce the static energy consumed”). Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to combine the teachings of Chen with the Archer and Mclaren, as a whole, so as to deactive the interlayer and intralayer links to reduce the static energy consumed. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DHAVAL V PATEL whose telephone number is (571)270-1818. The examiner can normally be reached Monday to Friday (8:00am-4:30pm).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sam Ahn can be reached on 571-272-3044. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DHAVAL V PATEL/Primary Examiner, Art Unit 2631