DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 79, 85, 86, 89, 91, 97, 98, 100, 101, 103, 108, 109, 136, 137, 139-141, and 143 are amended and claims 144-147 are added in response to the last office action. Claims 79, 80, 83, 85-87, 89, 91, 97, 98, 100, 101, 103, 104, 108-112, 117, 121, 122, 125-127, 130, 133, and 136-147 are presented for examination. Gonzalez et al, Cotter et al, Barry et al, Wentzlaff et al, Khare et al, and May were cited, previously.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 79, 80, 83, 85-87, 91, 103, 104, 109-112, 117, 121, 122, 125-127, 137, 138, and 143-147  is/are rejected under 35 U.S.C. 103 as being unpatentable over Gonzalez et al [US 2004/0250046 A1] in view of Cotter et al [US 6,272,548 B1].
	As to claim 79, Gonzalez et al teach an integrated circuit [e.g., fig. 1], comprising: cluster circuits;
a first one of the cluster circuits [e.g., processor node 150, 200 in figs. 1, 2] including a first cluster-input bus [e.g., “In one embodiment, the processor network interface 240 is coupled directly to the Xtensa Processor Interface (PIF) for the processing element 220, which is an Xtensa processor.  In another embodiment, the processor network interface 240 is coupled to the processing element 220 through an AMBA AHB bus” in paragraph 0039; fig. 2], a first cluster-output bus [e.g., “In one embodiment, the processor network interface 240 is coupled directly to the Xtensa Processor Interface (PIF) for the processing element 220, which is an Xtensa processor.  In another embodiment, the processor network interface 240 is coupled to the processing element 220 through an AMBA AHB bus” in paragraph 0039; fig. 2], a first computing circuit [e.g., PROCESSING ELEMENT 220 in fig. 2], and
a first interface circuit [e.g., PROCESSOR NETWORK INTERFACE 240 in fig. 2] coupled to the computing circuit, the cluster-input bus, and the cluster-output bus, and configured to receive, from the computing circuit, a request to send a message that includes payload data, to generate, in response to the request, an outgoing message that includes a destination indicator and the payload data, and to cause the outgoing message to be provided on the cluster-output bus [e.g., “Using a memory mapped interface, the processing element 220 generates a request to read or write a memory location.  The processor network interface 240 then receives the request on the PIF or the AHB bus.  The processor network interface 240 then wraps the data as a network packet and transfers the packet onto the transport layer of an OSI layer, which is implemented by the processor network switch 250” in paragraph 0041; “Information (i.e., data, instructions, etc.) is communicated by ‘message-passing’ among arrayed processor nodes.  Accordingly, each processing node is associated with a unique node identifier or address (‘node ID’) by using a packet switched-like network to communicate information between at least two nodes by passing messages including such information.  A packet in accordance with one embodiment includes a header and a payload” in paragraph 0058; “In process 1210, a comparison is made between the destination network address of the packet and the processor node's network address.  If destination network address of the packet matches the processor node's network address, then the processor node is the destination for the packet and the packet is processed” in paragraph 0096]; and
a first two-dimensional interconnection network [e.g., “A system for processing applications includes processor nodes and links interconnecting the processor nodes” in Abstract; fig. 1].
Gonzalez et al do not explicitly teach the first two-dimensional interconnection network being a first two-dimensional directional-torus interconnection network including a first unidirectional ring of the routers including a first router having first and second message outputs and including a second unidirectional ring of the routers including the first router, wherein the first message output of the first router is coupled to a message input of another router of the first ring, wherein the second message output of the first router is coupled to a message input of another router of the second ring, and wherein one of the first and second message outputs of the first router is also coupled to the first cluster input bus. However, Cotter et al teach a first two-dimensional interconnection network [e.g., optical network 1 in fig. 2] being a first two-dimensional directional-torus interconnection network [e.g., “In the example shown in FIG. 2, a Manhattan Street Network (MS-Net) topology is used.  This is a two-connected, regular network with unidirectional links.  There is an even number of rows and columns with two links arriving and two links leaving each node N. Logically, the links form a grid on the surface of a torus, with links in adjacent rows or columns travelling in opposite directions” in col. 5, lines 18-25] including a first unidirectional ring of the routers [e.g., each horizontal torus connecting each node N in each row in fig. 2] including a first router [e.g., any node N on the top row in fig. 2] having first and second message outputs [e.g., Or, Oc in fig. 2] and including a second unidirectional ring of the routers [e.g., each vertical torus connecting each node N in each column in fig. 2] including the first router, wherein the first message output [e.g., Or in fig. 2] of the first router is coupled to a message input of another router [e.g., any node including destination node N on any row in fig. 2] of the first ring, wherein the second message output [e.g., Oc in fig. 2] of the first router is coupled to a message input of another router [e.g., any node including destination node N on any column in fig. 2] of the second ring, and wherein one of the first and second message outputs of the first router is also coupled to the first cluster input bus [e.g., DROP, HOST at the destination node is coupled to any of torus rings in fig. 4; “Preferably the network has at least two dimensions, and the packet carries at least two directional flags, one for each dimension of the network. … For example, in a regular rectangular mesh network with rows and columns associated with the principal axes of the compass, a packet may have knowledge that its destination is located north and east.  The packet self-navigates through the network by choosing whenever possible to travel in a direction that leads broadly towards the destination.  When the packet encounters a routing node, it simply instructs the node as to the preferred direction of onward travel: the node does not compute an optimum direction” in col. 3, lines 43-59; “All the packets that enter the routing switch (whether received from the network for forwarding or inserted from the local host) are routed to one of the outgoing links according to the rules of the dead-reckoning scheme, following the preferences indicated by the `destination bearings` where possible” in col. 8, lines 51-55 “For the purpose of the tables showing the detailed routing logic, it is assumed that the 2x2 `cross-bar` routing switches at the crosspoints of the network are configured so that the `bar` state is the straight-through direction for cells travelling in both the row and column directions, and the `cross` switch state causes a change of direction” in col. 15, lines 30-33]. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify to implement Cotter et al’s teaching above including the details of the first interconnection network in the two-dimensional directional-torus configuration in order to increase simplicity and/or flexibility in the message routing between the cluster circuits of Gonzalez et al.
As to claim 80, the combination of Gonzalez et al and Cotter et al teaches wherein the first computing circuit includes a plurality of instruction-executing computing cores [e.g., “The processing element 220 includes a standard or native instruction set that provides a set of instructions that the processor element 220 is designed to recognize and execute” in paragraph 0037 of Gonzalez et al].
As to claim 83, the combination teaches wherein the first computing circuit includes one or more non-instruction-executing accelerator circuits [e.g., “The ISEF 210 is coupled to the processing element 220.  The ISEF 210 includes programmable logic for enabling application-specific instructions (‘instruction extensions’) to be stored and executed” in paragraph 0036 of Gonzalez et al].
As to claim 85, the combination teaches wherein the first router is further configured: to determine whether an incoming message identifies the first one of the cluster circuits as a destination of the incoming message; and to provide at least a portion of the incoming message on the first cluster-input bus if the first router circuit determines that the incoming message identifies the first one of the cluster circuits as the destination of the incoming message [e.g., “In process 1210, a comparison is made between the destination network address of the packet and the processor node's network address.  If destination network address of the packet matches the processor node's network address, then the processor node is the destination for the packet and the packet is processed” in paragraph 0096 of Gonzalez et al].
As to claim 86, the combination teaches wherein the first router further includes: a first router-input bus coupled to the first cluster-output bus; and wherein the first router is configured to receive the outgoing message on the first router-input bus, and to provide, via one of the first and second message outputs, the outgoing message to the second one of the cluster circuits corresponding to the destination indicator [e.g., “The processor network switch 327 of the processing element 322 is coupled to the NS link 328 and the EW link 329, and is configured to receive and transmit data, instructions and other information” in paragraph 0050, “Data injected by the source is transmitted to the destination and delivered in-order” in paragraph 0042, “In process 1210, a comparison is made between the destination network address of the packet and the processor node's network address.  If destination network address of the packet matches the processor node's network address, then the processor node is the destination for the packet and the packet is processed” in paragraph 0096 of Gonzalez et al; “FIG. 4 shows the structure of an individual node N. It incorporates a switch 2 which is set to route an incoming packet either to the node's row output Or, to the column output Oc or to the host local to the node.  This host may, for example, be one of a number of processors connected to respective nodes and forming in combination a multi-processor parallel processing computer system.  The switch 2 also has an input from the host so that, when appropriate, the node can insert a packet from the local host onto the network” in col. 5, lines 27-36, fig. 4 of Cotter et al].
As to claim 87, the combination teaches wherein the first router-output is coupled to the first cluster-input bus [e.g., “FIG. 4 shows the structure of an individual node N. It incorporates a switch 2 which is set to route an incoming packet either to the node's row output Or, to the column output Oc or to the host local to the node.  This host may, for example, be one of a number of processors connected to respective nodes and forming in combination a multi-processor parallel processing computer system.  The switch 2 also has an input from the host so that, when appropriate, the node can insert a packet from the local host onto the network” in col. 5, lines 27-36, fig. 4 of Cotter et al].
As to claim 91, the combination teaches wherein the first router wherein: the first router is configured to indicate to the first one of the cluster circuits that a message on the first and second message outputs is an incoming message for the first one of the cluster circuits; and wherein the first interface circuit is configured to cause the incoming message to be coupled from the first and second message outputs to the first cluster-input bus in response to the indication [e.g., “The processor network switch 327 of the processing element 322 is coupled to the NS link 328 and the EW link 329, and is configured to receive and transmit data, instructions and other information” in paragraph 0050, “Data injected by the source is transmitted to the destination and delivered in-order” in paragraph 0042, “In process 1210, a comparison is made between the destination network address of the packet and the processor node's network address.  If destination network address of the packet matches the processor node's network address, then the processor node is the destination for the packet and the packet is processed” in paragraph 0096 of Gonzalez et al; “If a packet is recognised as having reached its destination it is dropped from the network and diverted to the local host” in col. 8, lines 46-48, fig. 4 of Cotter et al].
As to claim 103, the combination teaches wherein the first interconnection network includes a network bus to which the routers are coupled, the network bus wide enough to carry all bits of an output message simultaneously [e.g., “In one embodiment, each of the communication links can transmit/receive 128 bits wide of data at 500 Mhz (i.e., 8 GB/s), for example” in paragraph 0056 of Gonzalez et al].
As to claim 104, the combination teaches wherein the first interconnection network includes a router configured for coupling to a circuit that is external to the integrated circuit [e.g., fig. 1 of Gonzalez et al].
As to claim 109, Gonzalez et al teach a method, comprising:
generating intermediate data [e.g., “Using a memory mapped interface, the processing element 220 generates a request to read or write a memory location” in paragraph 0041] with a first computing circuit of a first cluster circuit [e.g., processor node 150, 200 in figs. 1, 2] on an integrated circuit, the first computing circuit including one or more first processors each including a respective first instruction-executing computing core or a respective first configurable accelerator, together the one or more first processors including multiple first instruction-executing computing cores or at least one first configurable accelerator;
sending the intermediate data from the first cluster circuit to a second cluster circuit on the integrated circuit via a two-dimensional interconnection network on the integrated circuit including one or more first-dimension interconnection of routers and one or more second-dimension interconnection of the routers [e.g., “The processor network interface 240 then receives the request on the PIF or the AHB bus.  The processor network interface 240 then wraps the data as a network packet and transfers the packet onto the transport layer of an OSI layer, which is implemented by the processor network switch 250” in paragraph 0041; processor network switches 154 in fig. 1]; and 
generating, in response to the intermediate data, first output data with a second
computing circuit of the second cluster circuit, the second computing circuit including one or more second processors each including a respective second instruction-executing computing core or a respective second configurable accelerator, together the one or more second processors including multiple second instruction-executing computing cores or at least one second configurable accelerator [e.g., “In process 1210, a comparison is made between the destination network address of the packet and the processor node's network address.  If destination network address of the packet matches the processor node's network address, then the processor node is the destination for the packet and the packet is processed” in paragraph 0096].
Gonzalez et al do not explicitly teach the two-dimentional interconnection network being a two-dimensional directional-torus interconnection network including at least one or more first-dimension unidirectional rings of the routers and one or more second-dimension unidirectional rings of the routers, and one of the routers providing the intermediate data directly to both the second cluster circuit and to the an adjacent one of the routers. However, Cotter et al teach a two-dimentional interconnection network being a two-dimensional directional-torus interconnection network including at least one or more first-dimension unidirectional rings of the routers and one or more second-dimension unidirectional rings of the routers, and one of the routers providing the intermediate data directly to both the second cluster circuit and to the an adjacent one of the routers [e.g., “Packet routing networks may be used, for example, to interconnect the different processors of a multi-processor computer, or as the basis of a LAN interconnecting a number of different computers” in col. 1, lines 13-16; “Preferably the network has at least two dimensions, and the packet carries at least two directional flags, one for each dimension of the network. … For example, in a regular rectangular mesh network with rows and columns associated with the principal axes of the compass, a packet may have knowledge that its destination is located north and east.  The packet self-navigates through the network by choosing whenever possible to travel in a direction that leads broadly towards the destination.  When the packet encounters a routing node, it simply instructs the node as to the preferred direction of onward travel: the node does not compute an optimum direction” in col. 3, lines 43-59; “In the example shown in FIG. 2, a Manhattan Street Network (MS-Net) topology is used.  This is a two-connected, regular network with unidirectional links.  There is an even number of rows and columns with two links arriving and two links leaving each node N. Logically, the links form a grid on the surface of a torus, with links in adjacent rows or columns travelling in opposite directions” in col. 5, lines 18-25; figs. 2, 4; “All the packets that enter the routing switch (whether received from the network for forwarding or inserted from the local host) are routed to one of the outgoing links according to the rules of the dead-reckoning scheme, following the preferences indicated by the `destination bearings` where possible” in col. 8, lines 51-55, “For the purpose of the tables showing the detailed routing logic, it is assumed that the 2x2 `cross-bar` routing switches at the crosspoints of the network are configured so that the `bar` state is the straight-through direction for cells travelling in both the row and column directions, and the `cross` switch state causes a change of direction” in col. 15, lines 30-33]. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify to implement Cotter et al’s teaching above including the details of the first interconnection network in the two-dimensional directional-torus configuration in order to increase simplicity and/or flexibility in the message routing between the cluster circuits of Gonzalez et al.
As to claim 110, the combination teaches receiving input data at the first cluster circuit via the interconnection network; and wherein generating the intermediate data includes generating the intermediate data with the first computing circuit in response to the input data [e.g., “Using a memory mapped interface, the processing element 220 generates a request to read or write a memory location.  The processor network interface 240 then receives the request on the PIF or the AHB bus.  The processor network interface 240 then wraps the data as a network packet and transfers the packet onto the transport layer of an OSI layer, which is implemented by the processor network switch 250” in paragraph 0041, “The channels carry the results of each processor node, which is communicated to the next computational kernel for additional processing.  For example, stage 608 (‘stage 1’) represents, in time, the first processes.  Then, the results of stage 608 are communicated to stage 610 (‘stage 2’) for further processing (e.g., variable length coding), which depends upon the first processes' results” in paragraph 0067 of Gonzalez et al].
As to claim 111, the combination teaches wherein receiving the input data includes receiving the input data from a third cluster circuit on the integrated circuit via the interconnection network [e.g., “Using a memory mapped interface, the processing element 220 generates a request to read or write a memory location.  The processor network interface 240 then receives the request on the PIF or the AHB bus.  The processor network interface 240 then wraps the data as a network packet and transfers the packet onto the transport layer of an OSI layer, which is implemented by the processor network switch 250” in paragraph 0041, “The channels carry the results of each processor node, which is communicated to the next computational kernel for additional processing.  For example, stage 608 (‘stage 1’) represents, in time, the first processes.  Then, the results of stage 608 are communicated to stage 610 (‘stage 2’) for further processing (e.g., variable length coding), which depends upon the first processes' results” in paragraph 0067 of Gonzalez et al].
As to claim 112, the combination teaches wherein receiving the input data includes receiving the input data from a source circuit via the interconnection network, the source circuit external to the integrated circuit [e.g., “Using a memory mapped interface, the processing element 220 generates a request to read or write a memory location.  The processor network interface 240 then receives the request on the PIF or the AHB bus.  The processor network interface 240 then wraps the data as a network packet and transfers the packet onto the transport layer of an OSI layer, which is implemented by the processor network switch 250” in paragraph 0041, “The channels carry the results of each processor node, which is communicated to the next computational kernel for additional processing.  For example, stage 608 (‘stage 1’) represents, in time, the first processes.  Then, the results of stage 608 are communicated to stage 610 (‘stage 2’) for further processing (e.g., variable length coding), which depends upon the first processes' results” in paragraph 0067, fig. 1 of Gonzalez et al].
As to claim 117, the combination teaches the first cluster circuit generating a message that includes the intermediate data and a destination indicator that indicates the second cluster circuit; and wherein sending the intermediate data includes sending the message from the first cluster circuit to a first router of the interconnection network, sending the message from the first router to a second router of the interconnection network in a number of clock cycles equal to a number of routers through which the message propagates, the number inclusive of the first router and the second router, and sending the message from the second router to the second cluster circuit [e.g., “Information (i.e., data, instructions, etc.) is communicated by ‘message-passing’ among arrayed processor nodes.  Accordingly, each processing node is associated with a unique node identifier or address (‘node ID’) by using a packet switched-like network to communicate information between at least two nodes by passing messages including such information.  A packet in accordance with one embodiment includes a header and a payload” in paragraph 0058, “In process 1210, a comparison is made between the destination network address of the packet and the processor node's network address.  If destination network address of the packet matches the processor node's network address, then the processor node is the destination for the packet and the packet is processed” in paragraph 0096, “In FIG. 7, a channel carries data from node 700 to node 704.  The data first travels at timeslot 0 from node 700 to node 701 via link 710.  At timeslot 1, the switch (not shown) at node 701 takes one cycle to pass the data on link 711.  At timeslot 2, the link 712 then carries the data to node 703.  Finally, at timeslot 3, the link 713 carries the data travels to node 704” in paragraph 0075 of Gonzalez et al].
As to claim 121, the combination teaches sending the intermediate data from the first cluster circuit to a third cluster circuit on the integrated circuit via the interconnect network; and generating, in response to the intermediate data, second output data with a third computing circuit of the third cluster circuit [e.g., “Using a memory mapped interface, the processing element 220 generates a request to read or write a memory location.  The processor network interface 240 then receives the request on the PIF or the AHB bus.  The processor network interface 240 then wraps the data as a network packet and transfers the packet onto the transport layer of an OSI layer, which is implemented by the processor network switch 250.  When the processor network interface 240 receives a response packet, the processor network interface 240 strips the packet control information and returns the data to the processing element 220 as a transaction on the PIF or AHB bus” in paragraph 0041, “The channels carry the results of each processor node, which is communicated to the next computational kernel for additional processing.  For example, stage 608 (‘stage 1’) represents, in time, the first processes.  Then, the results of stage 608 are communicated to stage 610 (‘stage 2’) for further processing (e.g., variable length coding), which depends upon the first processes' results” in paragraph 0067 of Gonzalez et al].
As to claim 122, the combination teaches wherein sending the intermediate data includes sending a first portion of the intermediate data from the first cluster circuit to the second cluster circuit; sending a second portion of the intermediate data from the first cluster circuit to a third cluster circuit on the integrated circuit via the interconnection network; wherein generating the first output data includes generating, in response to the first portion of the intermediate data, the first output data with the second computing circuit; and generating, in response to the second portion of the intermediate data, second output data with a third computing circuit of the third cluster circuit [e.g., “Using a memory mapped interface, the processing element 220 generates a request to read or write a memory location.  The processor network interface 240 then receives the request on the PIF or the AHB bus.  The processor network interface 240 then wraps the data as a network packet and transfers the packet onto the transport layer of an OSI layer, which is implemented by the processor network switch 250.  When the processor network interface 240 receives a response packet, the processor network interface 240 strips the packet control information and returns the data to the processing element 220 as a transaction on the PIF or AHB bus” in paragraph 0041, “The channels carry the results of each processor node, which is communicated to the next computational kernel for additional processing.  For example, stage 608 (‘stage 1’) represents, in time, the first processes.  Then, the results of stage 608 are communicated to stage 610 (‘stage 2’) for further processing (e.g., variable length coding), which depends upon the first processes' results” in paragraph 0067 of Gonzalez et al].
As to claim 125, the combination teaches wherein sending the intermediate data includes sending a first portion of the intermediate data from the first cluster circuit to the second cluster circuit; sending a second portion of the intermediate data from the first cluster circuit to a third cluster circuit on the integrated circuit via the interconnection network; wherein generating the first output data includes generating, in response to the first portion of the intermediate data, the first output data with a first configurable accelerator of the second computing circuit, the first configurable accelerator having a configuration; and generating, in response to the second portion of the intermediate data, second output data with a third configurable accelerator of a third computing circuit of the third cluster circuit, the third configurable accelerator having the configuration [e.g., “Using a memory mapped interface, the processing element 220 generates a request to read or write a memory location.  The processor network interface 240 then receives the request on the PIF or the AHB bus.  The processor network interface 240 then wraps the data as a network packet and transfers the packet onto the transport layer of an OSI layer, which is implemented by the processor network switch 250.  When the processor network interface 240 receives a response packet, the processor network interface 240 strips the packet control information and returns the data to the processing element 220 as a transaction on the PIF or AHB bus” in paragraph 0041, “The channels carry the results of each processor node, which is communicated to the next computational kernel for additional processing.  For example, stage 608 (‘stage 1’) represents, in time, the first processes.  Then, the results of stage 608 are communicated to stage 610 (‘stage 2’) for further processing (e.g., variable length coding), which depends upon the first processes' results” in paragraph 0067, figs. 1-3, 6-12 of Gonzalez et al].
As to claim 126, the combination teaches writing the intermediate data from the first computing circuit into a memory circuit of the first cluster circuit; reading the intermediate data from the memory circuit onto a first cluster-output bus of the first cluster circuit; and wherein sending the intermediate data includes coupling the intermediate data from the first cluster-output bus to a bus of the interconnection network [e.g., “In some embodiments, the processor network interface 240 also performs any reads or writes of the DP-RAM 230 that are posted to the AHB bus.  When other devices need access to the DP-RAM 230, the processor network interface 240 provides a way to share its dedicated port to the DP-RAM 230” in paragraph 0044, “In this instance, the 16x16 block of a current frame then will be temporally stored in local memory 326 for performing one or more compression algorithm steps.  The local memory 326 can also optionally store a block of pixels from a previous and/or later video frame so as to perform any of the known video compression prediction techniques” in paragraph 0052 of Gonzalez et al].
As to claim 127, the combination teaches writing the intermediate data from a bus of the interconnection network into a memory circuit of the second cluster circuit; at least one of the second processors of the second computing circuit reading the intermediate data from the memory; wherein generating the first output data includes at least one of the second processors of the second computing circuit generating the output data; and writing the first output data from at least one of the second processors of the second computing circuit to the memory [e.g., “In some embodiments, the processor network interface 240 also performs any reads or writes of the DP-RAM 230 that are posted to the AHB bus.  When other devices need access to the DP-RAM 230, the processor network interface 240 provides a way to share its dedicated port to the DP-RAM 230” in paragraph 0044, “In this instance, the 16x16 block of a current frame then will be temporally stored in local memory 326 for performing one or more compression algorithm steps.  The local memory 326 can also optionally store a block of pixels from a previous and/or later video frame so as to perform any of the known video compression prediction techniques” in paragraph 0052; “In some embodiments, the cross-bar 325 is used to provide access to the local memory 326 from the processor network switch 327, a neighboring processing node (e.g., east neighbor 330), and the processing element 322” in paragraph 0053 of Gonzalez et al].
As to claim 137, the combination teaches wherein the first router has a message-input bus coupled to the first message output bus of the first router or to the second message output bus of the first router [e.g., figs. 2, 4 of Cotter et al]. 
As to claim 138, the combination teaches wherein the computing circuit includes a plurality of instruction-executing cores coupled to a plurality of memory ports on a cluster shared data memory by one or more direction-connection, time-division multiplexing, crossbar, and ring interconnects [e.g., figs 1-3, “TDM can be implemented using a table with one entry per timeslot.  This table indicates the connections that should be enabled in the crossbar (i.e. which egress port to use for each ingress port)” in paragraph 0077 of Gonzalez et al].
As to claim 143, the combination teaches wherein the first router is further configured to deflect a received message from one of the first and second message outputs to the other of the first and second message outputs in response to message contention at the one of the first and second message outputs [e.g., “That is, communications between specific processor nodes may be prioritized so as to have the shortest number of hops between those processors, the least congested path, and/or any other path that facilitates optimal processing performance.  Returning to the example of P1 and P2, if P1 has a longer transmit time because of congestion, for example, then path P2 can be selected to communicate information between nodes 410 and 320” in paragraph 0057 of Gonzalez et al; “The MS-Net is well suited to a simple deflection strategy for contention resolution” in col. 6, lines 44-45, “However, if there is no available buffer space, then one of the two packets (chosen at random) is deflected to the other output port. When two packets are present at the routing switch and one of them has no particular outward routing preference, then that packet will be the candidate for deflection” in col. 8, line 66-col. 9, line 4 of Cotter et al].
As to claim 144, the combination teaches wherein the one of the first and second message outputs of the first router is coupled to the message input of the other router via only one wire per bit [e.g., “Furthermore, more hardware is required at the network interface to slowly insert a 64 or 128-bit quantity into a 1, 2, or 4-bit ‘wire.’”” in paragraph 0081 of Gonzalez et al; rings in fig. 2 of Cotter et al].
As to claim 145, the combination teaches wherein the one of the first and second message outputs of the first router is coupled to the first cluster input bus via only one wire per bit [e.g., “Furthermore, more hardware is required at the network interface to slowly insert a 64 or 128-bit quantity into a 1, 2, or 4-bit ‘wire.’”” in paragraph 0081 of Gonzalez et al; rings in fig. 2 of Cotter et al].
As to claim 146, the combination teaches wherein the one of the first and second message outputs of the first router is coupled to the message input of the other router via one or more pipeline registers [e.g., “Based on Table 1 (i), FIG. 19 shows a circuit diagram for the routing logic processor in a crosspoint oriented west to east and south to north, for cells incoming from the west. This detailed diagram confirms that the routing logic for dead reckoning, using the 4 input bits referred to earlier, is sufficiently simple that the routing rules can be executed with hard-wired electronic circuitry using a small number of elementary boolean logic gates (invert, AND and OR), without the need for arithmetic, registers, or look-up tables” in col. 15, lines 35-43 of Cotter et al]. 
As to claim 147, the combination teaches wherein the one of the first and second message outputs of the first router is coupled to the first cluster input bus via one or more pipeline registers [e.g., “Based on Table 1 (i), FIG. 19 shows a circuit diagram for the routing logic processor in a crosspoint oriented west to east and south to north, for cells incoming from the west. This detailed diagram confirms that the routing logic for dead reckoning, using the 4 input bits referred to earlier, is sufficiently simple that the routing rules can be executed with hard-wired electronic circuitry using a small number of elementary boolean logic gates (invert, AND and OR), without the need for arithmetic, registers, or look-up tables” in col. 15, lines 35-43 of Cotter et al].
Claims 89, 98, 100, 101, 130, 133, and 139-142 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gonzalez et al and Cotter et al as applied to claims 79 and 109 above, and further in view of Wentzlaff et al [US 8,631,205 B1].
	As to claim 89, the combination of Gonzalez et al and Cotter et al does not expressly teach, however Wentzlaff et al teach wherein the first router is configured to multicast the outgoing message to a second one of the cluster circuits and to one or more third ones of the cluster circuits corresponding to the destination indicator [e.g., “In a given clock cycle, the switch control module 304A can enable the multiplexers to move data independently onto any output port from any input port, including multicasting an input port to all output ports, as long as two input ports are not connected to the same output port in the same clock cycle” in col. 8, lines 13-18]. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify to implement Wentzlaff et al’s teaching above in order to increase applicability, flexibility, and/or efficiency for the message transmission of the combination.
As to claim 98, the combination of Gonzalez et al and Cotter et al teaches a second router of the routers [e.g., another processor network switch 154 in fig. 1 of Gonzalez et al] coupled to the second one of the cluster circuits [e.g., another processor node 150 in fig. 1 of Gonzalez et al]; wherein the first computing circuit of the first one of the cluster circuits includes first instruction-executing computing cores, one of the first instruction-executing computing cores configured to generate the payload data [e.g., “Using a memory mapped interface, the processing element 220 generates a request to read or write a memory location” in paragraph 0041 of Gonzalez et al]; wherein the second one of the cluster circuits includes a second computing circuit having second configurable accelerators and includes a second interface circuit [e.g., fig. 1 of Gonzalez et al]; wherein the first interface circuit of the first one of the cluster circuits is configured to generate the destination indicator to indicate the second one of the cluster circuits [e.g., “Information (i.e., data, instructions, etc.) is communicated by ‘message-passing’ among arrayed processor nodes.  Accordingly, each processing node is associated with a unique node identifier or address (‘node ID’) by using a packet switched-like network to communicate information between at least two nodes by passing messages including such information.  A packet in accordance with one embodiment includes a header and a payload” in paragraph 0058 of Gonzalez et al]; wherein the first router is configured to provide the outgoing message to the second router [e.g., “The processor network interface 240 then wraps the data as a network packet and transfers the packet onto the transport layer of an OSI layer, which is implemented by the processor network switch 250” in paragraph 0041 of Gonzalez et al]; wherein the second router is configured to provide the outgoing message to the second one of the cluster circuits as an incoming message [e.g., “In process 1210, a comparison is made between the destination network address of the packet and the processor node's network address.  If destination network address of the packet matches the processor node's network address, then the processor node is the destination for the packet and the packet is processed” in paragraph 0096 of Gonzalez et al; “For the purpose of the tables showing the detailed routing logic, it is assumed that the 2x2 `cross-bar` routing switches at the crosspoints of the network are configured so that the `bar` state is the straight-through direction for cells travelling in both the row and column directions, and the `cross` switch state causes a change of direction” in col. 15, lines 30-33 of Cotter et al]. The combination does not further teach, however Wentzlaff et al teach wherein the second one of the cluster circuits includes a second computing circuit having second configurable accelerators [e.g., functional units 242A, 242B in fig. 2B; ALU(n) in fig. 5A; “The MIT Raw integrated circuit design provides reconfigurability of an FPGA along with the performance and capability of an ASIC” in col.1, lines 36-38]; wherein the destination indicator generated further indicates one of the second configurable accelerators of the second one of the cluster circuits; and wherein the second interface circuit of the second one of the cluster circuits is configured to provide the payload data of the incoming message to the one of the second configurable accelerators indicated by the destination indicator [e.g., “The pipeline integrated switch enables a value computed by an ALU of a given tile to be used as an operand in a neighboring tile's ALU with extremely low latency” in col. 13, lines 18-21; “After a packet reaches the destination tile, the packet is then sent to a final destination (which can also be indicated in the packet header).  The final destination can direct data to an off-tile location over a network port to the north, east, south, west, or can direct the data to a functional unit within the tile, such as the processor or an on-tile memory unit or functional unit” in col. 17, line 65-col. 18, line 4]. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify to implement Wentzlaff et al’s teaching above in order to increase efficiency and/or speed in the message routing between the cluster circuits.
As to claim 100, the combination of Gonzalez et al and Cotter et al teaches a second router of the routers [e.g., another processor network switch 154 in fig. 1 of Gonzalez et al] coupled to a second one of the cluster circuits [e.g., another processor node 150 in fig. 1 of Gonzalez et al]; wherein the first computing circuit of the first one of the cluster circuits includes first configurable accelerators, one of the first configurable accelerators configured to generate the payload data [e.g., “Using a memory mapped interface, the processing element 220 generates a request to read or write a memory location” in paragraph 0041 of Gonzalez et al]; wherein the second one of the cluster circuits includes a second computing circuit having second configurable accelerators and includes a second interface circuit [e.g., fig. 1 of Gonzalez et al]; wherein the first interface circuit of the first one of the cluster circuits is configured to generate the destination indicator to indicate the second one of the cluster circuits [e.g., “Information (i.e., data, instructions, etc.) is communicated by ‘message-passing’ among arrayed processor nodes.  Accordingly, each processing node is associated with a unique node identifier or address (‘node ID’) by using a packet switched-like network to communicate information between at least two nodes by passing messages including such information.  A packet in accordance with one embodiment includes a header and a payload” in paragraph 0058 of Gonzalez et al]; wherein the first router is configured to provide the outgoing message to the second router [e.g., “The processor network interface 240 then wraps the data as a network packet and transfers the packet onto the transport layer of an OSI layer, which is implemented by the processor network switch 250” in paragraph 0041 of Gonzalez et al]; wherein the second router is configured to provide the outgoing message to the second one of the cluster circuits as an incoming message [e.g., “In process 1210, a comparison is made between the destination network address of the packet and the processor node's network address.  If destination network address of the packet matches the processor node's network address, then the processor node is the destination for the packet and the packet is processed” in paragraph 0096 of Gonzalez et al; “For the purpose of the tables showing the detailed routing logic, it is assumed that the 2x2 `cross-bar` routing switches at the crosspoints of the network are configured so that the `bar` state is the straight-through direction for cells travelling in both the row and column directions, and the `cross` switch state causes a change of direction” in col. 15, lines 30-33 of Cotter et al]. The combination does not further teach, however Wentzlaff et al teach wherein the second one of the cluster circuits includes a second computing circuit having second configurable accelerators [e.g., functional units 242A, 242B in fig. 2B; ALU(n) in fig. 5A; “The MIT Raw integrated circuit design provides reconfigurability of an FPGA along with the performance and capability of an ASIC” in col.1, lines 36-38]; wherein the destination indicator generated further indicates one of the second configurable accelerators of the second one of the cluster circuits; and wherein the second interface circuit of the second one of the cluster circuits is configured to provide the payload data of the incoming message to the one of the second configurable accelerators indicated by the destination indicator [e.g., “The pipeline integrated switch enables a value computed by an ALU of a given tile to be used as an operand in a neighboring tile's ALU with extremely low latency” in col. 13, lines 18-21; “After a packet reaches the destination tile, the packet is then sent to a final destination (which can also be indicated in the packet header).  The final destination can direct data to an off-tile location over a network port to the north, east, south, west, or can direct the data to a functional unit within the tile, such as the processor or an on-tile memory unit or functional unit” in col. 17, line 65-col. 18, line 4]. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify to implement Wentzlaff et al’s teaching above in order to increase efficiency and/or speed in the message routing between the cluster circuits.
As to claim 101, the combination of Gonzalez et al and Cotter et al teaches a second router [e.g., another processor network switch 154 in fig. 1 of Gonzalez et al] coupled to a second one of the cluster circuits [e.g., another processor node 150 in fig. 1 of Gonzalez et al]; wherein the first computing circuit of the first one of the cluster circuits includes first instruction-executing computing core and a first configurable accelerator, one of the first instruction-executing computing core and a first configurable accelerator configured to generate the payload data [e.g., “Using a memory mapped interface, the processing element 220 generates a request to read or write a memory location” in paragraph 0041, figs. 2, 3 of Gonzalez et al]; wherein the second one of the cluster circuits includes a second computing circuit having a second instruction-executing core and a second configurable accelerator and includes a second interface circuit [e.g., figs. 1-3]; wherein the first interface circuit of the first one of the cluster circuits is configured to generate the destination indicator to indicate one of the second one of the cluster circuits [e.g., “Information (i.e., data, instructions, etc.) is communicated by ‘message-passing’ among arrayed processor nodes.  Accordingly, each processing node is associated with a unique node identifier or address (‘node ID’) by using a packet switched-like network to communicate information between at least two nodes by passing messages including such information.  A packet in accordance with one embodiment includes a header and a payload” in paragraph 0058 of Gonzalez et al]; wherein the first router is configured to provide the outgoing message to the second router [e.g., “The processor network interface 240 then wraps the data as a network packet and transfers the packet onto the transport layer of an OSI layer, which is implemented by the processor network switch 250” in paragraph 0041 of Gonzalez et al]; wherein the second router is configured to provide the outgoing message to the second one of the cluster circuits as an incoming message [e.g., “In process 1210, a comparison is made between the destination network address of the packet and the processor node's network address.  If destination network address of the packet matches the processor node's network address, then the processor node is the destination for the packet and the packet is processed” in paragraph 0096 of Gonzalez et al]. The combination does not further teach, however Wentzlaff et al teach wherein the second one of the cluster circuits includes a second computing circuit having a second instruction-executing core and a second configurable accelerator [e.g., functional units 242A, 242B in fig. 2B; ALU(n) in fig. 5A; “The MIT Raw integrated circuit design provides reconfigurability of an FPGA along with the performance and capability of an ASIC” in col.1, lines 36-38]; wherein the destination indicator generated further indicates one of the second instruction-executing core and the second configurable accelerator of the second one of the cluster circuits; and wherein the second interface circuit of the second one of the cluster circuits is configured to provide the payload data of the incoming message to the one of the second instruction-executing computing core and the second configurable accelerator indicated by the destination indicator [e.g., “The pipeline integrated switch enables a value computed by an ALU of a given tile to be used as an operand in a neighboring tile's ALU with extremely low latency” in col. 13, lines 18-21; “After a packet reaches the destination tile, the packet is then sent to a final destination (which can also be indicated in the packet header).  The final destination can direct data to an off-tile location over a network port to the north, east, south, west, or can direct the data to a functional unit within the tile, such as the processor or an on-tile memory unit or functional unit” in col. 17, line 65-col. 18, line 4]. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify to implement Wentzlaff et al’s teaching above in order to increase efficiency and/or speed in the message routing between the cluster circuits.
As to claim 130, the combination of Gonzalez et al and Cotter et al does not teach, however Wentzlaff et al teach the first cluster circuit, the second cluster circuit, and the interconnection network are instantiated on a field-programmable gate array [e.g., “The MIT Raw integrated circuit design provides reconfigurability of an FPGA along with the performance and capability of an ASIC” in col.1, lines 36-38]. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify to implement Wentzlaff et al’s teaching above including the integrated circuit implemented into the Field Programmable Gate Array in order to increase configurability and/or flexibility for the integrated circuit of the combination.
As to claim 133, the combination of Gonzalez et al and Cotter et al does not explicitly teach, however Wentzlaff et al teach wherein: at least a portion of one of the first cluster circuit, the second cluster circuit, and the interconnection network is instantiated on a field-programmable gate array; and at least another potion of one of the first cluster circuit, the second cluster circuit, and the interconnection network is disposed on the field-programmable gate array [e.g., “The MIT Raw integrated circuit design provides reconfigurability of an FPGA along with the performance and capability of an ASIC” in col.1, lines 36-38]. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify to implement Wentzlaff et al’s teaching above including the integrated circuit implemented into the Field Programmable Gate Array in order to increase configurability and/or flexibility for the integrated circuit of the combination.
As to claim 139, the combination of Gonzalez et al and Cotter et al teaches wherein one dual-output lookup table is configured to input one bit on a first message input of the first router, one bit on a second message input of the first router, and one bit on a client input of the first router, and is further configured to output one bit on the first message output bus of the first router and one bit on the second message output of the first router [e.g., “These simple routing rules provide a basis on which a packet may select its onward path at each crosspoint with good efficiency.  A routing logic processor having the task of executing these rules requires just 4 bits of information for each packet: i) the destination bearings (2 bits); ii) whether or not the destination row (column) matches the node row (column) (1 bit each).  Using these 4 input bits, the routing logic is sufficiently simple that the rules can be executed with hard-wired electronic circuitry using a small number of elementary boolean logic gates, without the need for arithmetic, registers or look-up tables” in col. 7, lines 47-57 of Cotter et al]. The combination does not explicitly teach, however Wentzlaff et al teach the first routing circuit is instantiated on a field-programmable gate array [e.g., “The MIT Raw integrated circuit design provides reconfigurability of an FPGA along with the performance and capability of an ASIC” in col.1, lines 36-38]. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify to implement Wentzlaff et al’s teaching above including the integrated circuit implemented into the Field Programmable Gate Array in order to increase feasibility, configurability, and/or flexibility for the integrated circuit of the combination.
As to claim 140, the combination of Gonzalez et al and Cotter et al teaches an instruction memory instantiated in one dual-port block RAM having a first port write data bus coupled to a first message input of the first router, and having first and second port read data buses coupled to first and second instruction-executing cores, respectively [e.g., PROCESSING ELEMENT 220, ISEF 210, DP-RAM 230 in fig. 2 of Gonzalez et al]. The combination does not explicitly teach, however Wentzlaff et al teach the instruction memory is instantiated on a field-programmable gate array [e.g., “The MIT Raw integrated circuit design provides reconfigurability of an FPGA along with the performance and capability of an ASIC” in col.1, lines 36-38]. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify to implement Wentzlaff et al’s teaching above including the integrated circuit implemented into the Field Programmable Gate Array in order to increase configurability and/or flexibility for the integrated circuit of the combination.
As to claim 141, the combination of Gonzalez et al and Cotter et al teaches a memory circuit disposed in the first cluster and including a shared memory having a first group of one or more shared-memory ports coupled to a message-input bus and to a message-output bus, and a second group of one or more shared-memory ports [e.g., DP-RAM 230 in fig. 2 of Gonzalez et al]; a multiplexer circuit disposed in the first cluster and having a first group of multiplexer ports, and having a second group of one or more multiplexer ports coupled to the second group of one more shared-memory ports [e.g., MUX/DEMUX 912, 922, 932, 942 in fig. 9 of Gonzalez et al]; and wherein the computing circuit includes instruction-executing computing cores [e.g., ISEF 210, PROCESSING ELEMENT 220 in fig. 2, fig. 9 of Gonzalez et al]. The combination does not explicitly teach, however Wentzlaff et al teach the computing circuit further includes one or more configurable accelerators coupled to the message-input bus, the message-output bus, and to the first group of one or more shared-memory ports [e.g., DMA 806 in fig. 8]. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify to implement Wentzlaff et al’s teaching above in order to increase performance in processing for the combination.
As to claim 142, the combination of Gonzalez et al, Cotter et al, and Wentzlaff et al teaches wherein one block RAM memory of a field-programmable gate array implements one dual-port bank of the shared memory including one first shared-memory and one second shared-memory port [e.g., “In some embodiments, the processor network interface 240 also performs any reads or writes of the DP-RAM 230 that are posted to the AHB bus.  When other devices need access to the DP-RAM 230, the processor network interface 240 provides a way to share its dedicated port to the DP-RAM 230” in paragraph 0044, “In this instance, the 16x16 block of a current frame then will be temporally stored in local memory 326 for performing one or more compression algorithm steps.  The local memory 326 can also optionally store a block of pixels from a previous and/or later video frame so as to perform any of the known video compression prediction techniques” in paragraph 0052 of Gonzalez et al; “The MIT Raw integrated circuit design provides reconfigurability of an FPGA along with the performance and capability of an ASIC” in col.1, lines 36-38 of Wentzlaff et al]. 
Claim 97 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gonzalez et al and Cotter et al as applied to claim 79 above, and further in view of May [US 2008/0229059 A1].
	As to claim 97, the combination of Gonzalez et al and Cotter et al teaches a second one of the routers [e.g., another processor network switch 154 in fig. 1 of Gonzalez et al] coupled to the second one of the cluster circuits [e.g., another processor node 150 in fig. 1 of Gonzalez et al] and including a second routing circuit; wherein the first computing circuit of the first one of the cluster circuits includes first instruction-executing computing cores, one of the first instruction-executing computing cores configured to generate the payload data [e.g., “Using a memory mapped interface, the processing element 220 generates a request to read or write a memory location” in paragraph 0041 of Gonzalez et al]; wherein the second one of the cluster circuits includes a second computing circuit having second instruction-executing computing cores and includes a second interface circuit [e.g., fig. 1 of Gonzalez et al]; wherein the first interface circuit of the first one of the cluster circuits is configured to generate the destination indicator to indicate the second one of the cluster circuits [e.g., “Information (i.e., data, instructions, etc.) is communicated by ‘message-passing’ among arrayed processor nodes.  Accordingly, each processing node is associated with a unique node identifier or address (‘node ID’) by using a packet switched-like network to communicate information between at least two nodes by passing messages including such information.  A packet in accordance with one embodiment includes a header and a payload” in paragraph 0058 of Gonzalez et al]; wherein the first routing circuit of the first one of the routers is configured to provide the outgoing message to the second one of the routers [e.g., “The processor network interface 240 then wraps the data as a network packet and transfers the packet onto the transport layer of an OSI layer, which is implemented by the processor network switch 250” in paragraph 0041 of Gonzalez et al]; wherein the second routing circuit of the second one of the routers is configured to provide the outgoing message to the second one of the cluster circuits as an incoming message [e.g., “In process 1210, a comparison is made between the destination network address of the packet and the processor node's network address.  If destination network address of the packet matches the processor node's network address, then the processor node is the destination for the packet and the packet is processed” in paragraph 0096 of Gonzalez et al; “For the purpose of the tables showing the detailed routing logic, it is assumed that the 2x2 `cross-bar` routing switches at the crosspoints of the network are configured so that the `bar` state is the straight-through direction for cells travelling in both the row and column directions, and the `cross` switch state causes a change of direction” in col. 15, lines 30-33 of Cotter et al]. The combination does not further teach, however May teaches wherein the second one of the cluster circuits includes a second computing circuit having second instruction-executing computing cores [e.g., processors 4 in fig. 2]; wherein the destination indicator generated further indicates one of the second instruction-executing computing cores of the second one of the cluster circuits; and wherein the second interface circuit of the second one of the cluster circuits is configured to provide the payload data of the incoming message to the one of the second instruction-executing computing cores indicated by the destination indicator [e.g., “In operation, the system switch 16 receives an incoming message 26, either from one of the local processors (i.e. a processor of the same node) or from a processor of a different node.  Each node is allocated an address identifying it within the array, each processor is allocated an address identifying it within the respective node, and each channel is allocated an address identifying it within the respective processor.  Addresses comprise component quanta of address data, in most cases individual bits.  As illustrated schematically in FIG. 3, the message includes a header 27 comprising a destination node portion 28 which specifies a destination node address, a destination processor portion 30 which specifies a destination processor address, and destination channel portion 32 which specifies a destination channel address” in paragraph 0031]. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify to implement May’s teaching above in order to increase efficiency and/or speed in the message routing between the cluster circuits [e.g., “Bear in mind that the array may comprise tens, hundreds, or even thousands of nodes and so a message may have to be relayed between numerous nodes before reaching its destination.  Therefore finding the most efficient route throughout the array is not a straightforward task” in paragraph 0033 of May].
Claim 108 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gonzalez et al [US 2004/0250046 A1] in view of Wentzlaff et al [US 8,631,205 B1] further in view of Cotter et al [US 6,272,548 B1].
	As to claim 108, Gonzalez et al teach an integrated circuit [e.g., fig. 1], comprising: cluster circuits;
a first one of the cluster circuits [e.g., processor node 150, 200 in figs. 1, 2] including a first cluster-input bus [e.g., “In one embodiment, the processor network interface 240 is coupled directly to the Xtensa Processor Interface (PIF) for the processing element 220, which is an Xtensa processor.  In another embodiment, the processor network interface 240 is coupled to the processing element 220 through an AMBA AHB bus” in paragraph 0039; fig. 2], a first cluster-output bus [e.g., “In one embodiment, the processor network interface 240 is coupled directly to the Xtensa Processor Interface (PIF) for the processing element 220, which is an Xtensa processor.  In another embodiment, the processor network interface 240 is coupled to the processing element 220 through an AMBA AHB bus” in paragraph 0039; fig. 2], a first computing circuit [e.g., PROCESSING ELEMENT 220 in fig. 2], and
a first interface circuit [e.g., PROCESSOR NETWORK INTERFACE 240 in fig. 2] coupled to the computing circuit, the cluster-input bus, and the cluster-output bus, and configured to receive, from the computing circuit, a request to send a message that includes payload data, to generate, in response to the request, an outgoing message that includes a destination indicator and the payload data, and to cause the outgoing message to be provided on the cluster-output bus [e.g., “Using a memory mapped interface, the processing element 220 generates a request to read or write a memory location.  The processor network interface 240 then receives the request on the PIF or the AHB bus.  The processor network interface 240 then wraps the data as a network packet and transfers the packet onto the transport layer of an OSI layer, which is implemented by the processor network switch 250” in paragraph 0041; “Information (i.e., data, instructions, etc.) is communicated by ‘message-passing’ among arrayed processor nodes.  Accordingly, each processing node is associated with a unique node identifier or address (‘node ID’) by using a packet switched-like network to communicate information between at least two nodes by passing messages including such information.  A packet in accordance with one embodiment includes a header and a payload” in paragraph 0058; “In process 1210, a comparison is made between the destination network address of the packet and the processor node's network address.  If destination network address of the packet matches the processor node's network address, then the processor node is the destination for the packet and the packet is processed” in paragraph 0096]; and
a first interconnection network including
routers [e.g., PROCESSOR NETWORK SWITCH 154, 250 in figs. 1, 2; “The processor network switch 327, in some cases, can operate as a ‘router’ as packets are received and either accepted into the processor node 320, or passed on to another switch of another processor node” in paragraph 0058] each coupled to a respective one of the cluster circuits, and 
a first one of the routers coupled to the first one of the cluster circuits and including a first routing circuit configured to provide the outgoing message to a second one of the cluster circuits corresponding to the destination indicator [e.g., “The processor network switch 327, in some cases, can operate as a ‘router’ as packets are received and either accepted into the processor node 320, or passed on to another switch of another processor node” in paragraph 0058].
Gonzalez et al do not explicitly disclose, however Wentzlaff et al teach the integrated circuit is instantiated by a non-transitory computer-readable medium storing configuration data that, when received by a field-programmable gate array, causes the field-programmable gate array [e.g., “The MIT Raw integrated circuit design provides reconfigurability of an FPGA along with the performance and capability of an ASIC” in col.1, lines 36-38]. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify to implement Wentzlaff et al’s teaching above including the integrated circuit implemented into the Field Programmable Gate Array in order to increase configurability and/or flexibility for the integrated circuit of Gonzalez et al.
The combination of Gonzalez et al and Wentzlaff et al does not explicitly teach the first two-dimensional interconnection network being a first two-dimensional directional-torus interconnection network including a first unidirectional ring of the routers including a first router having first and second message outputs and including a second unidirectional ring of the routers including the first router, wherein the first message output of the first router is coupled to a message input of another router of the first ring, wherein the second message output of the first router is coupled to a message input of another router of the second ring, and wherein one of the first and second message outputs of the first router is also coupled to the first cluster input bus. However, Cotter et al teach a first two-dimensional interconnection network [e.g., optical network 1 in fig. 2] being a first two-dimensional directional-torus interconnection network [e.g., “In the example shown in FIG. 2, a Manhattan Street Network (MS-Net) topology is used.  This is a two-connected, regular network with unidirectional links.  There is an even number of rows and columns with two links arriving and two links leaving each node N. Logically, the links form a grid on the surface of a torus, with links in adjacent rows or columns travelling in opposite directions” in col. 5, lines 18-25] including a first unidirectional ring of the routers [e.g., each horizontal torus connecting each node N in each row in fig. 2] including a first router [e.g., any node N on the top row in fig. 2] having first and second message outputs [e.g., Or, Oc in fig. 2] and including a second unidirectional ring of the routers [e.g., each vertical torus connecting each node N in each column in fig. 2] including the first router, wherein the first message output [e.g., Or in fig. 2] of the first router is coupled to a message input of another router [e.g., any node including destination node N on any row in fig. 2] of the first ring, wherein the second message output [e.g., Oc in fig. 2] of the first router is coupled to a message input of another router [e.g., any node including destination node N on any column in fig. 2] of the second ring, and wherein one of the first and second message outputs of the first router is also coupled to the first cluster input bus [e.g., DROP, HOST at the destination node is coupled to any of torus rings in fig. 4; “Preferably the network has at least two dimensions, and the packet carries at least two directional flags, one for each dimension of the network. … For example, in a regular rectangular mesh network with rows and columns associated with the principal axes of the compass, a packet may have knowledge that its destination is located north and east.  The packet self-navigates through the network by choosing whenever possible to travel in a direction that leads broadly towards the destination.  When the packet encounters a routing node, it simply instructs the node as to the preferred direction of onward travel: the node does not compute an optimum direction” in col. 3, lines 43-59; “All the packets that enter the routing switch (whether received from the network for forwarding or inserted from the local host) are routed to one of the outgoing links according to the rules of the dead-reckoning scheme, following the preferences indicated by the `destination bearings` where possible” in col. 8, lines 51-55; “For the purpose of the tables showing the detailed routing logic, it is assumed that the 2x2 `cross-bar` routing switches at the crosspoints of the network are configured so that the `bar` state is the straight-through direction for cells travelling in both the row and column directions, and the `cross` switch state causes a change of direction” in col. 15, lines 30-33]. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify to implement Cotter et al’s teaching above including the details of the first interconnection network in the two-dimensional directional-torus configuration in order to increase simplicity and/or flexibility in the message routing between the cluster circuits of the combination.
Claim 136 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gonzalez et al [US 2004/0250046 A1] in view of Wentzlaff et al [US 8,631,205 B1] further in view of Khare et al [US 2017/0171111 A1] and Cotter et al [US 6,272,548 B1].
	As to claim 136, Gonzalez et al teach a non-transitory computer-readable medium storing configuration data that, when received by an integrated circuit [e.g., “The ISEF 210 is coupled to the processing element 220.  The ISEF 210 includes programmable logic for enabling application-specific instructions (‘instruction extensions’) to be stored and executed” in paragraph 0036; “The local memory 326 can be configured to receive instructions and/or data, as well as other information that a specific processing element 322 uses to execute its portion of program instructions assigned to that element” in paragraph 0052], causes the integrated circuit: 
to generate intermediate data with a first computing circuit of a first cluster circuit on an integrated circuit [e.g., “Using a memory mapped interface, the processing element 220 generates a request to read or write a memory location.  The processor network interface 240 then receives the request on the PIF or the AHB bus.  The processor network interface 240 then wraps the data as a network packet and transfers the packet onto the transport layer of an OSI layer, which is implemented by the processor network switch 250” in paragraph 0041] including a first instruction-executing computing core and a first configurable accelerator [e.g., processing element 220, ISEF 210 in fig. 2];
to send the intermediate data from the first cluster circuit to a second cluster circuit on the integrated circuit via an interconnection network on the integrated circuit [e.g., “Using a memory mapped interface, the processing element 220 generates a request to read or write a memory location.  The processor network interface 240 then receives the request on the PIF or the AHB bus.  The processor network interface 240 then wraps the data as a network packet and transfers the packet onto the transport layer of an OSI layer, which is implemented by the processor network switch 250” in paragraph 0041]; and
to generate, in response to the intermediate data, first output data with a second computing circuit including a second instruction-executing computing cores or a second configurable accelerator [e.g., “Using a memory mapped interface, the processing element 220 generates a request to read or write a memory location.  The processor network interface 240 then receives the request on the PIF or the AHB bus.  The processor network interface 240 then wraps the data as a network packet and transfers the packet onto the transport layer of an OSI layer, which is implemented by the processor network switch 250” in paragraph 0041]. 
Gonzalez et al do not explicitly disclose, however Wentzlaff et al teach the integrated circuit is instantiated by a non-transitory computer-readable medium storing configuration data that, when received by a field-programmable gate array, causes the field-programmable gate array [e.g., “The MIT Raw integrated circuit design provides reconfigurability of an FPGA along with the performance and capability of an ASIC” in col.1, lines 36-38]. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify to implement Wentzlaff et al’s teaching above including the integrated circuit implemented into the Field Programmable Gate Array in order to increase configurability and/or flexibility for the integrated circuit of Gonzalez et al.
The combination of Gonzalez et al and Wentzlaff et al does not explicitly teach, however Khare et al teach wherein the first computing circuit including a plurality of first instruction-executing computing cores [e.g., “In some embodiments, NoC 200 forms a routing network for processing elements 201 (e.g., intellectual property (IP) cores such as processors, accelerators, memories, graphic units, etc.) in an integrated circuit (IC) or a computer system” in paragraph 0030]. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify to implement Khare et al’s teaching above in order to increase performance in processing of the combination.
The combination of Gonzalez et al/Wentzlaff et al/Khare et al does not explicitly teach, however Cotter et al teach the first interconnection network including one or more first-dimension unidirectional rings of the routers, one or more second-dimension unidirectional rings of the routers, and [e.g., “Packet routing networks may be used, for example, to interconnect the different processors of a multi-processor computer, or as the basis of a LAN interconnecting a number of different computers” in col. 1, lines 13-16; “Preferably the network has at least two dimensions, and the packet carries at least two directional flags, one for each dimension of the network. … For example, in a regular rectangular mesh network with rows and columns associated with the principal axes of the compass, a packet may have knowledge that its destination is located north and east.  The packet self-navigates through the network by choosing whenever possible to travel in a direction that leads broadly towards the destination.  When the packet encounters a routing node, it simply instructs the node as to the preferred direction of onward travel: the node does not compute an optimum direction” in col. 3, lines 43-59; “In the example shown in FIG. 2, a Manhattan Street Network (MS-Net) topology is used.  This is a two-connected, regular network with unidirectional links.  There is an even number of rows and columns with two links arriving and two links leaving each node N. Logically, the links form a grid on the surface of a torus, with links in adjacent rows or columns travelling in opposite directions” in col. 5, lines 18-25; figs. 2, 4; “All the packets that enter the routing switch (whether received from the network for forwarding or inserted from the local host) are routed to one of the outgoing links according to the rules of the dead-reckoning scheme, following the preferences indicated by the `destination bearings` where possible” in col. 8, lines 51-55]. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify to implement Cotter et al’s teaching above including the details of the first interconnection network in the two-dimensional directional-torus configuration in order to increase simplicity and/or flexibility in the data routing between the cluster circuits of the combination.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ILWOO PARK whose telephone number is (571) 272-4155.  The examiner can normally be reached on M-F, 9 AM-5 PM EST. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Dr. Henry Tsai can be reached on (571) 272-4176.  The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300. lnformation regarding the status of an application may be obtained from the Patent Application lnformation Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/ILWOO PARK/Primary Examiner, Art Unit 2184                                                                                                                                                                                                        5/12/2022