DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 2,5,7-8 ,18 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Sequin (IEEE paper entitled Doubly Twisted Torus Networks for VLSI processor arrays).

Sequin taught the invention as claimed (as to claim 1) including.  A computer (multiprocessor (e.g., see page 471 left column, Introduction section) comprising- a plurality of interconnected processing nodes (nodes 1-16 in fig. 3a,nodes 1-36 in 4a) arranged in a configuration in which multiple layers of interconnected nodes are arranged along an axis [the rows of nodes provide the layers along a “x” axis], each layer comprising at least four processing nodes connected in a non-axial ring by at least one respective intralayer links between each pair of neighbouring processing nodes, wherein each of the at least four processing nodes in each layer is connected to a respective corresponding node in one or more adjacent layer by a respective interlayer link, the computer being programmed to provide in the configuration two embedded one-dimensional paths and to transmit data around each of the two embedded one-dimensional paths, each embedded one-dimensional path using all processing nodes of the computer in such a manner that the two embedded one-dimensional paths operate simultaneously without sharing links(e.g., see figs. 3a,4a) [note the vertical path loops that uses all nodes and in addition the separate  horizontal path also loops using all nodes without sharing links between vertical and horizontal links].

As to claim 2 Sequin taught The computer of claim 1, wherein the configuration is a toroid configuration in which respective connected corresponding nodes of the multiple layers form at least four axial rings (e.g., see figs. 1, 2, 3a, 4a,6, 7,8)[note the 3-dimensional view in fig 6 shows that the doubly twisted torus forms at least four axial rings].

As to claim 5 Sequin taught the computer of claim 1, wherein each layer of the multiple layers has exactly four nodes (e.g., see figs. 3a, 7).


As to claim 7 Sequin taught   The computer of claim 1 which comprises a number of layers arranged along the axis which is the same as the number of nodes in each layer (e.g., see figs. 2a, 3a,4a, 7,8).
As to claim 8 Sequin taught The computer of claim 1 wherein the intralayer and interlayer links comprise fixed connections between the processing nodes (e.g., see figs. 2a,3a,4a,7) [note the depiction of the links between the notes does not detail any changing the connections and therefore this includes that connections that  are structured to operate as fixed connections].
As to claim 18 Sequin taught   The computer of claim 1 which comprises four layers, each having four processing nodes connected a in ring (e.g..  see fig. 7).



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 3,4,6,24 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sequin as applied to claims 1 above, and further in view of Mukhopadhyay (patent application publication No. 2010/0158005).
As to claim 3 Sequin taught the computer of claim 1 but did not expressly detail wherein at least one of the interlayer and intralayer links comprise switching circuitry (14) operable to connect one of the processing nodes selectively to one of multiple other processing nodes. Mukhopadhyay however taught this limitation (e.g., see fig. 1 and paragraphs 0031-0032).
	It would have been obvious to one of ordinary skill in the art to combine the teachings of Sequin and Mukohpadhyay. Both references were directed toward the problems providing transmission of data between processing elements in plural directions  in a processor multi-dimensional  array. One of ordinary skill would have been motivated to incorporate the Mukhopadhyay teachings of switches for selecting connecting the processing elements via links at least to allow the system change the path of the data to be processed when it is determined that a path with lowest offset or distance is available  to reduce the time to transmit the data to intended destination to increase the throughput (e.g., see paragraph 0033 of Mukhopadhyay). 
As to claim 4 Sequin taught   The computer of claim 1, but did not expressly detail wherein each processing node is configured to output data on its respective intralayer and interlayer links with the same bandwidth utilisation on each of the intralayer and interlayer links of the processing node. Muknopadhyay however taught switches and links that operate the same for horizontal and vertical transmission of data as the switch sends data in one of four directions to the intralayer (horizontal) or interlayer (vertical) direction. This provides the same bandwidth for transmission of intralayer and interlayer links (e.g., see paragraphs 0074-0075).
As to claim 6 Sequin taught   The computer of claim 1 but did not expressly detail  which comprises a number of layers arranged along the axis which is greater than the number of processing nodes in each layer. Mukhopadhyay however taught this limitation (e.g., see fig. 1). 

As to claim 24 Sequin taught the computer of claim 1, but did not expressly detail programmed to transmit the data in data transmission steps such that each link of a processing node is utilised with the same bandwidth as other links of that processing node in each data transmission step. Mukhopadhyay however taught this limitation (e.g., see fig. 1 and paragraphs 0031-0032) .Note Mukhopadhyay  taught switches and links that operate the same for horizontal and vertical transmission of data as the switch sends data in one of four directions to the intralayer (horizontal) or interlayer (vertical) direction. This provides the same bandwidth for transmission of intralayer and interlayer links (e.g., see paragraphs 0074-0075).
	It would have been obvious to one of ordinary skill in the art to combine the teachings of Sequin and Mukhopadhyay. Both references were directed toward the problems providing transmission of data between processing elements in plural directions  in a processor multi-dimensional  array. One of ordinary skill would have been motivated to incorporate the Mukhopadhyay teachings of switches for selecting connecting the processing elements via links at least to allow the system change the path of the data to be processed when it is determined that a path with lowest offset or distance is available  to reduce the time to transmit the data to intended destination to increase the throughput (e.g., see paragraph 0033 of Mukhopadhyay). 




Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sequin as applied to claim1 above, and further in view of Chen (patent No. 9,077,616).

As to claim 9 Chen taught  the computer of claim 1, wherein the multiple layers comprise first and second endmost layers and at least one intermediate layer between the first and second endmost layers, wherein each processing node in the first endmost layer is connected to a non-neighbouring node in the first endmost layer in addition to its neighbouring node, and each processing node in the second endmost layer is connected to a non-neighbouring node in the second endmost layer in addition to its neighbouring node (e.g., see figs.1, 2).
It would have been obvious to one of ordinary skill in the art to combine the teachings of Sequin and Chen. Both references were directed toward the problems providing transmission of data between processing elements in plural directions in a processor multi-dimensional array. One of ordinary skill in the art would have been motivated to incorporate the Chen teaching of connecting nodes to adjacent nodes in a layer as well as non-adjacent node in a layer at least to provide quick direct access to data for parallel processing in a manner to prevent waiting for intermediate nodes to transmit the data to a node and therefore increase throughput.

Claim(s) 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sequin and Chen as applied to claim 9 above, and further in view of Mukhopadhyay (patent application publication No. 2010/0158005).


As to claim 11 Sequin and Chen taught   The computer of claim 9 Mukhopadhyay taught  wherein at least one of the interlayer links of processing nodes in the first endmost layer comprise switching circuitry operable to disconnect the processing node from its neighbouring node in the first endmost layer and connect it to a corresponding node in the second endmost layer(e.g., see paragraph 0075 and fig. 1).
It would have been obvious to one of ordinary skill in the art to combine the teachings of Sequin and Mukhopadhyay. Both references were directed toward the problems providing transmission of data between processing elements in plural directions  in a processor multi-dimensional  array. One of ordinary skill would have been motivated to incorporate the Mukhopadhyay teachings of switches for selecting connecting the processing elements via links in the same layer and different layers at least to allow the system change the path of the data to be processed when it is determined that a path with lowest offset or distance is available  to reduce the time to transmit the data to intended destination to increase the throughput (e.g., see paragraph 0033 of Mukhopadhyay) 

Claim(s) 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sequin  as applied to claim 1 above, and further in view of Philip (patent No.  9,742,630) or Maresca (Springer Verlag paper entitled Parallel Computer Vision on Poly morphic Torus Architecture).

As to claim 12 Sequin taught  the computer of claim 1,  Philip taught wherein each embedded one-dimensional path comprises alternating sequences of one of the interlayer links and one of the intralayer links (e.g., see fig. 2(b) [route  202] and  col. 3, lines 18-38 of Philip).  Maresca also taught this limitation (e.g., see portion of section 3.1 on page 219 and fig. 4(A) and section 4.1, and fig. 6 of Maresca).
It would have been obvious to one of ordinary skill in the art to combine the teachings of Sequin and Philip. Both references were directed toward the problems providing transmission of data between processing elements in plural directions in a processor multi-dimensional array. One of ordinary skill would have been motivated to incorporate the Philip teaching of multi-turn route for data between processing elements at least adjust to capacity of different links based bandwidth to optimize the time required to transmit data from source to destination to optimize throughput.
It would have been obvious to one of ordinary skill in the art to combine the teachings of Sequin and Maresca. Both references were directed toward the problems providing transmission of data between processing elements in plural directions in a processor multi-dimensional array. One of ordinary skill would have been motivated to incorporate Maresca teaching of the multi-turn route for data between processing elements at least adjust to capacity of different links based bandwidth to optimize the time required to transmit data from source to destination to optimize throughput.

Claim(s) 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sequin  as applied to claim 1 above, and further in view of Philip (patent No. 9,742,630).

As to claim 13 Sequin taught the computer of claim 1 Philip taught in which each one-dimensional embedded path comprises a sequence of processing nodes which are visited in a direction in each layer which is the same in all layers within each one-dimensional path (e.g., see fig. 2(b) [route 202] and col. 3, lines 18-38 of Philip).
It would have been obvious to one of ordinary skill in the art to combine the teachings of Sequin and Philip. Both references were directed toward the problems providing transmission of data between processing elements in plural directions in a processor multi-dimensional array. One of ordinary skill would have been motivated to incorporate the Philip teaching of multi-turn route for data between processing elements at least adjust to capacity of different links based bandwidth to optimize the time required to transmit data from source to destination to optimize throughput.

Claim(s) 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sequin  as applied to claim 1 above, and further in view of  Maresca (Springer Verlag paper entitled Parallel Computer Vision on Poly morphic Torus Architecture).

As to claim 14 Sequin taught   The computer of claim 1 Maresca taught in which each one-dimensional embedded path comprises a sequence of processing nodes which are visited in a direction in each layer which is different in successive layers within each one-dimensional path(e.g., see portion of  section 3.1 on page 219 and fig. 4(A) and section 4.1 , and  fig. 6 of Maresca)] [the snake sweeping route in fig. 6 and the routing in fig. 4(A) provides this limitation.
. It would have been obvious to one of ordinary skill in the art to combine the teachings of Sequin and Maresca. Both references were directed toward the problems providing transmission of data between processing elements in plural directions in a processor multi-dimensional array. One of ordinary skill would have been motivated to incorporate Maresca teaching of the multi-turn route for data between processing elements at least adjust to capacity of different links based bandwidth to optimize the time required to transmit data from source to destination to optimize throughput.

Claim(s) 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sequin  as applied to claim 1 above, and further in view of  Lee patent application publication No. 2020//0065659) and Jain (Elsevier paper entitiled Comparative design and Analysis of Mesh, Torus and Ring NoC). 
As to claim 15 Sequin taught the computer of claim 1 Lee  taught  array of processors comprising six layers with at least one layer comprising four processing nodes (e.g., see fig. 2a and paragraph 0046 and 0275),  but did not expressly detail each layer having four processing nodes connected in a non-axial ring.  Jain however taught an array with each layer comprising four processing nodes and the number of layers  includes N layers  which are four or more layers (e.g., see section 2. entitled NoC Design Consideration].  Therefore at least on embodiment of  the combined system includes  a system with each layer having four processing nodes. One ordinary skill would have been motivated to use the six layer  with each layer being four nodes embodiment a situation specific decision or design choice to implement a specific application. 
It would have been obvious to one of ordinary skill in the art to combine the teachings of Sequin and Lee. Both references were directed toward the problems providing transmission of data between processing elements in plural directions in a processor multi-dimensional array. One of ordinary skill would have been motivated to incorporate Lee teaching of an array or processors with six layers with some layers having four processing nodes  having four processing nodes at least to provide an optimized configuration for a specific application.

 It would have been obvious to one of ordinary skill in the art to combine the teachings of Sequin and Jain. Both references were directed toward the problems providing transmission of data between processing elements in plural directions in a processor multi-dimensional array. One of ordinary skill would have been motivated to incorporate Jain teaching of array or processing nodes with each layer having four nodes at least to provided optimized configuration for a specific application. 

Claim(s) 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sequin as applied to claim 1 above, and further in view of Rustad (patent application publication No. 2020/0089612).
As to claim 16 Rudstad taught  the computer of claim 1 which comprises eight layers, each having eight processing nodes connected in a non-axial ring (e.g., see fig. 9).
It would have been obvious to one of ordinary skill in the art to combine the teachings of Sequin and Rudstad. Both references were directed toward the problems providing transmission of data between processing elements in plural directions  in a processor multi-dimensional  array. One of ordinary skill would have been motivated to incorporate the Rudstad teachings eight layers  connected in non-axial ring with redundant paths  at least to increase reliability and reduce congestion (e.g., see  paragraphs 0065-0068).

Claim(s) 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sequin as applied to claim1 above, and further in view Camara (IEEE paper entitled Twisted Torus Topologies for Enhanced Interconnection Networks)
As to claim 17 Sequin taught   The computer of claim 1 Camara taught  which comprises eight layers each having four processing nodes connected in a ring (e.g., see figs. 1A,1B) [note each of  the eight vertical layers include four nodes].
It would have been obvious to one of ordinary skill in the art to combine the teachings of Sequin and Camara. Both references were directed toward the problems providing transmission of data between processing elements in plural directions  in a processor multi-dimensional  array. One of ordinary skill would have been motivated to incorporate the Camara teachings eight layers  with four processing elements  at least to increase performance as Camara detailed that array with twice as many nodes in one direction (e.g., 4x2) provides increased performance (e.g., see page 1766 section 2, left column). 





Claim(s) 25,28,29 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sequin   in view of  Maresca (Springer Verlag paper entitled Parallel Computer Vision on Poly morphic Torus Architecture).

As to claim 25 Sequin taught   A method of generating a set of programs to be executed in parallel on a computer comprising a plurality of processing nodes connected in a configuration with multiple layers arranged along an axis, each layer comprising at least four processing nodes connected in a non-axial ring [the rows of nodes provide the layers along a “x” axis and the nodes on the y direction provide rings perpendicular to the rows of nodes],  by a respective intralayer link between each pair of neighbouring processing nodes(nodes 1-16 in fig. 3a,nodes 1-36 in fig. 4a), wherein processing nodes in each layer are connected to respective corresponding nodes in each adjacent layer by an interlayer link,  but did not expressly detail  a program to define data transmission stage(s).  Maresca however taught the method comprising: generating  a first data transmission instruction for a first program to define a first data transmission stage in which data is transmitted from a  first node executing the first program (convolution instruction on page 221) , wherein the first data transmission instruction comprises a first link identifier [shift(X, west)] which defines  first outgoing link on which data is to be transmitted from the first processing node in the first data transmission stage; generating a second data transmission instruction [shift (X, East)] for a second program  [the is provided in the combined system using duplicate copies of the program of Maresca for performing two paths in Sequin] to define a second data transmission stage in which data is transmitted from a second node executing the second program, wherein the second data transmission instruction comprises a second link identifier (East) which defines a second outgoing link on which data is to be transmitted from the second node in the second data transmission stage; and determining the first link identifier and the second link identifier in order to transmit data around each of two embedded one-dimensional paths provided by the configuration(e.g., see section 4.1), 
As to each embedded one-dimensional path using all processing nodes of the computer in such a manner that the embedded one-dimensional logical paths operate simultaneously without sharing links. Sequin taught this limitation (e.g., see figs. 3a,4a).
[note the vertical path loops that uses all nodes and in addition the separate  horizontal path also loops using all nodes without sharing links between vertical and horizontal links].
	It would have been obvious to one of ordinary skill in the art to combine the teachings of Sequin and Maresca Both references were directed toward the problems providing transmission of data between processing elements in plural directions in a processor multi-dimensional array. One of ordinary skill would have been motivated to incorporate Maresca teaching of using a program to identify the link/stages in the routing of data to/from processing elements at least to provide a flexible control for optimizing routing of data  and in the combined system provides for the control of plural paths including timing the movement of data so no conflict/delays between the paths occur. 
	Due to the similarities between claims 25 and 29; claim 29 is rejected for the same reasons as claim 25 above. Note as to the execution of the instructions, the execution is implicit.  and One of ordinary skill would have been motivated to execute the instruction of Maresca at least to take advantage of the programmed/optimized routing of data in the operation of the processing array..  

As to claim 28 Sequin taught The method of claim 25, but did not expressly detail comprising transmitting the data from the first node in data transmission steps wherein the first outgoing link is utilised with a same bandwidth as a further outgoing link of the first node in each data transmission step. Mukhopadhyay however taught this limitation (e.g., see fig. 1 and paragraphs 0031-0032).
	It would have been obvious to one of ordinary skill in the art to combine the teachings of Sequin and Mukohpadhyay. Both references were directed toward the problems providing transmission of data between processing elements in plural directions  in a processor multi-dimensional  array. One of ordinary skill would have been motivated to incorporate the Mukhopadhyay teachings of switches for selecting connecting the processing elements via links at least to allow the system change the path of the data to be processed when it is determined that a path with lowest offset or distance is available  to reduce the time to transmit the data to intended destination to increase the throughput (e.g., see paragraph 0033 of Mukhopadhyay). 


Allowable Subject Matter
Claims 10,19-23,26,27 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  
Claims 10, 19-23, 26,27 require respectively among other things the limitations shown below: 
Claim 10.  The computer of claim 9 wherein at least one of the interlayer and intralayer links of processing nodes in the first endmost layer comprise switching circuitry operable to disconnect the processing node from its corresponding node in the second endmost layer and connect it to a non-neighbouring node in the first endmost layer.

Claim 19.  The computer of claim 1, wherein each processing node is programmed to divide a respective partial vector of that processing node into fragments and to transmit the data in the form of successive fragments around each embedded one-dimensional path.

Claim 20.  The computer of claim 19 which is programmed to operate each path as a set of logical rings, wherein the successive fragments are transmitted around each logical ring in simultaneous transmission steps.

Claim 21.  The computer of claim 19, wherein each processing node is configured to output a respective fragment on each of two links simultaneously, wherein the fragment output on each of the links has approximately the same size.

Claim 22.  The computer of claim 19, wherein each processing node is configured to reduce multiple incoming fragments with multiple respective corresponding locally stored fragments.

Claim 23.  The computer of claim 22, wherein each processing node is configured to transmit fully reduced fragments on each of its intralayer and interlayer links simultaneously in an Allgather phase of an Allreduce collective.
Claim 26.  The method of claim 25, wherein the first program comprises an additional instruction to deactivate any of its interlayer and intralayer links which are not used in data transmission.

Claim 27.  The method of claim 25, wherein the first program comprises an additional instruction to divide a respective partial vector of the processing first node on which that program is executed into fragments and to transmit the data in the form of successive fragments over the first outgoing link.
The closest prior art includes Sequin and Chen and Mukhopadyay and Camara and  Philip and Jain and Lee and Rustad and Maresca. These references taught the features of the claims that claims 10, 19-23,26,27 respectively depend as detailed above. However the limitations of claims 10, 19-23,26,27  as shown above among other things were not disclosed by the  closest prior art.  
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Gray (patent application publication No. 2018/0287964) disclosed composing cores and FPGAs at massive scale with directional, two dimensional routers and interconnection networks (e.g., see abstract).
Fricker  (patent application publication No. 2014/0226479) disclosed enhance 3D torus (e.g., see abstract).
Inman (patent application publication No. 2014/0149715) disclosed scalable and programmable computer systems (e.g., see abstract). 
Dobbs (patent application publication No. 2014/0143520) disclosed processing system with interspersed processors with multi-layer interconnect (e.g., see abstract).
Henry (patent application publication No. 2010/0246581) disclosed method  and apparatus for packet routing (e.g., see abstract).
Gonzalez (patent No. 7,613,900) systems and methods for selecting input /output configuration in an integrated circuit (e.g., see abstract).
Arabnia (Kluwer Academic publishers paper entitled Parallel Stereocorrelation on a Reconfiguration Multi-Ring Network pp. 243-269).
Wani (Klumer Academic publishers paper entitled (Parallel Edge-Region Based Segmentation Algorithm Targeted at Reconfigurable  Multi-Ring Network 20 pages).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC COLEMAN whose telephone number is (571)272-4163. The examiner can normally be reached M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 0-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

ERIC . COLEMAN
Primary Examiner
Art Unit 2183



EC
/ERIC COLEMAN/           Primary Examiner, Art Unit 2183