DETAILED ACTION
This Office Action is in response to Application No. 16/157,878 filed on October 11, 2018. Claims 1-22 are presented for examination and are currently pending.
	Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on October 11, 2018 was filed.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.


Claim Rejections - 35 USC § 112
Claims 9 and 10 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 9, which relies on independent claim 1, is rejected for reciting “wherein the broadcast network is adapted to…” Claim 1 which claim 9 depends on does not mention said broadcast network creating an antecedent basis issue. Examiner notes that claim 8, which relies on independent claim 1, does mention said broadcast network. For the purposes of this office action, examiner will interpret claim 9 to depend on claim 8. Examiner asks that the applicant kindly correct all similar mistakes.
Claim 10 is rejected for its reliance on claim 9. 




	
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.


Claims 1-16 and 20-22 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims: 1, 2, 4, 5, 14, and 15 of copending Application No. US17/077,720 in view of Lie (US 10699189 B2) and further Huang (US 20190180170 A1). 
The following table highlights some of the similarities between the two claimed inventions (with matching emphasis on either side of the chart). Although the claims at issue are not identical, they are not patentably distinct from each other because the claims of both applications are functionally the same in that they are directed towards neural network inference chips with details on moving data between cores.
This is a provisional nonstatutory double patenting rejection.


This application (16/157,878)
Reference application (17/077,720)
Claim 1: A neural inference chip comprising: a plurality of neural cores, each of the plurality of neural cores adapted to apply a plurality of synaptic weights to a plurality of input activations to produce a plurality of output activations; at least one network interconnecting the plurality of neural cores, the at least one network adapted to simultaneously deliver synaptic weights and/or input activations to the plurality of neural cores.
Claim 1: A neural inference processor comprising: a plurality of neural inference cores,
each of the plurality of neural inference cores comprising memory adapted to store input activations, output activations, and a neural network model,
the neural network model comprising synaptic weights, neuron parameters, and neural network instructions,
each of the plurality of neural inference cores configured to apply the synaptic weights to input activations from its memory to produce a plurality of output activations to its memory;
at least one model network interconnecting the plurality of neural inference cores, the at least one model network configured to distribute the neural network model among the plurality of neural inference cores;
at least one activation network interconnecting the plurality of neural inference cores, the at least one activation network configured to provide input activations to each of the plurality of neural inference cores and to obtain output.
Claim 2: wherein the plurality of cores is organized in a grid of two or more dimensions with at least one row and at least one column.
Claim 14: wherein the plurality of cores is organized in a grid of two or more dimensions with at least one row and at least one column
Claim 3: wherein the at least one network comprises a plurality of branches arranged in a tree network topology, and buffers disposed at each of the plurality of branches.
Claim 1: A neural inference processor comprising: a plurality of neural inference cores,
each of the plurality of neural inference cores comprising memory adapted to store input activations, output activations, and a neural network model,
the neural network model comprising synaptic weights, neuron parameters, and neural network instructions,
each of the plurality of neural inference cores configured to apply the synaptic weights to input activations from its memory to produce a plurality of output activations to its memory;
at least one model network interconnecting the plurality of neural inference cores, the at least one model network configured to distribute the neural network model among the plurality of neural inference cores;
at least one activation network interconnecting the plurality of neural inference cores, the at least one activation network configured to provide input activations to each of the plurality of neural inference cores and to obtain output.
Claim 4: wherein the at least one network comprises a plurality of branches arranged in a tree network topology, and routing controls disposed at each of the plurality of branches.
Claim 1: A neural inference processor comprising: a plurality of neural inference cores,
each of the plurality of neural inference cores comprising memory adapted to store input activations, output activations, and a neural network model,
the neural network model comprising synaptic weights, neuron parameters, and neural network instructions,
each of the plurality of neural inference cores configured to apply the synaptic weights to input activations from its memory to produce a plurality of output activations to its memory;
at least one model network interconnecting the plurality of neural inference cores, the at least one model network configured to distribute the neural network model among the plurality of neural inference cores;
at least one activation network interconnecting the plurality of neural inference cores, the at least one activation network configured to provide input activations to each of the plurality of neural inference cores and to obtain output.
Claim 5: wherein the at least one network is adapted to deliver intermediate results among the plurality of cores.

Claim 6: wherein the intermediate results comprise partial sums.
Claim 5: further comprising at least one partial sum network interconnecting the plurality of neural inference cores, the at least one partial sum network configured to convey partial sums among the plurality of neural inference cores.
Claim 7: wherein the partial sums comprise weighted sums of a subset of inputs.
Claim 5: further comprising at least one partial sum network interconnecting the plurality of neural inference cores, the at least one partial sum network configured to convey partial sums among the plurality of neural inference cores.
Claim 8: wherein the at least one network comprises a broadcast network.
Claim 2: wherein the at least one model network is configured to broadcast the neural network model from one of the plurality of neural inference cores to the other of the plurality of neural inference cores.
Claim 9: wherein the broadcast network is adapted to deliver a data tensor or block to all cores coupled to the broadcast network.
Claim 2: wherein the at least one model network is configured to broadcast the neural network model from one of the plurality of neural inference cores to the other of the plurality of neural inference cores.
Claim 10: wherein the data tensor or block comprises neural network input activations, intermediate activations, and/or parameters.
Claim 2: wherein the at least one model network is configured to broadcast the neural network model from one of the plurality of neural inference cores to the other of the plurality of neural inference cores.
Claim 11: wherein the at least one network comprises a multicast network.
Claim 4: wherein the at least one model network is configured to multicast the neural network model from one of the plurality of neural inference cores to a subset of the plurality of neural inference cores.
Claim 12: wherein the multicast network is adapted to deliver a data tensor or block to a subset of cores coupled to the broadcast network.
Claim 4: wherein the at least one model network is configured to multicast the neural network model from one of the plurality of neural inference cores to a subset of the plurality of neural inference cores.
Claim 13: wherein the data tensor or block comprises neural network input activations, intermediate activations, and/or parameters.
Claim 4: wherein the at least one model network is configured to multicast the neural network model from one of the plurality of neural inference cores to a subset of the plurality of neural inference cores.
Claim 14: wherein the at least one network comprises a row or column bus.
Claim 14: wherein the plurality of cores is organized in a grid of two or more dimensions with at least one row and at least one column.
Claim 15: wherein the at least one network comprises a row or column tree.
Claim 14: wherein the plurality of cores is organized in a grid of two or more dimensions with at least one row and at least one column.
Claim 16: wherein the at least one network comprises a systolic row or column shifter.
Claim 14: wherein the plurality of cores is organized in a grid of two or more dimensions with at least one row and at least one column.
Claim 20: wherein the grid comprises a mesh.
Claim 14: wherein the plurality of cores is organized in a grid of two or more dimensions with at least one row and at least one column.
Claim 21: wherein the mesh is adapted to communicate data between cores in cardinal directions.
Claim 14: wherein the plurality of cores is organized in a grid of two or more dimensions with at least one row and at least one column.
Claim 22: A method comprising: by at least one network of a neural inference chip, simultaneously delivering synaptic weights and/or input activations to a plurality of neural cores of the neural inference chip; by each of the plurality of neural cores, applying a plurality of synaptic weights to a plurality of input activations to produce a plurality of output activations, sending the output activations via the at least one network.
Claim 15: A method comprising:
by each of a plurality of neural inference cores,
storing input activations, output activations, and a neural network model, the neural network model comprising synaptic weights, neuron parameters, and neural network instructions, and
applying the synaptic weights to input activations from its memory to produce a plurality of output activations to its memory;
by at least one model network interconnecting the plurality of neural inference cores, distributing the neural network model among the plurality of neural inference cores;
by at least one activation network interconnecting the plurality of neural inference cores, providing input activations to each of the plurality of neural inference cores and obtaining output.



The table above uses the bold/italics/underline text to highlight the similarities between the two applications and claims within, while the non-bolded text signifies the difference between the claim sets. Upon review of the non-bolded text a person of ordinary skill in the art would concluded that the scope of the claim inventions are obvious variants of each other. To further expand on the above table, each claim listed above is either directly from the reference co-pending application and/or obvious within the view of the referenced art as explained below. Claims not directly referenced in the explanation below are believed to be adequately explained by the comparison chart above.
Regarding claim 1 and 22 of this application, the claim languages are similar to one another aside from being directed to different statutory categories. Examiner will only point out the similarities between claim 1 of this application and the co-pending application in this office action for clarity but the reasoning, motivation(s), and rationale apply for claim 22 as well. 
Claim 1 of this application and the reference application are rearranged versions of each other with the structure of the chip (the cores and network), the adapted to apply step, and delivery step being taught by the co-pending application. Specifically, the co-pending application recites delivering the entirety of the model which would include the synaptic weights and activation data which is included in the claim limitation of this application. 
In regards to claim 3, the claim is an obvious variation of the independent claim of the co-pending application. Lie recites the following: [ (Col. 15 Line 65 – Col. 16 Line 6) “An example fabric is a collection of logical and/or physical couplings between processing elements and/or within a single processing element. The fabric is usable to implement logical and/or physical communication topologies such as a mesh, a 2D mesh, a 3D mesh, a hypercube, a torus, a ring, a tree, or any combination thereof”] This citation teaches the tree topology and [ (Col. 15 Line 65 – Col. 16 Line 6) “An example of a physical coupling between processing elements is a set of physical interconnects (comprising optional and/or selective buffering) between physically-coupled processing elements” ] This citation teaches the processing elements (PEs) having optional buffers on each. The neural inference chip of the independent claim 1 of this application adding a specific topology, especially one that is well known in the art at the time of filing, does not distinguish itself from the claim language of the co-pending application. 
In regards to claim 4, the claim is similar to claim 3 aside from swapping out a buffer for a “routing control” on each of the plurality of branches. Lie recites the following: [ (Col. 16, Lines 23-25) “An example of physical coupling within a single processing element (having, e.g., a compute element and a router)” ] This citation teaches the PE (equivalent to the Core) having a routing element which is equivalent to the routing controls of the claim limitation. This citation from Lie along with the previously cited teachings for the tree reference (as shown above) teach the totality of the claim language and do not distinguish the claims from the co-pending application. 
Claims 5 and 6 of this application are directed towards intermediate results and delivering those results to the plurality of cores. Claim 6 specifies a type of intermediate result being delivered as a partial sum. The co-pending application teaches the partial sums being conveyed (equivalent to delivered) to the cores of the chip. As such the claims are obvious variants of each other. 
In regards to claim 7, the claim discusses the partial sums being comprised of weighted sums of inputs. As the co-pending application teaches the partial sums, Lie further explains with: [ (Col. 22, Lines 15-20) “wherein for each layer of the neural network, incoming activations are weighted to create partial sums that are accumulated to generate output activations for the layer, and the accumulated weighted partial sums represent the neurons and associated synapses of the neural network” ] This citation teaches the weighted activation inputs are utilized to create the partial sums which is equivalent to the claim language above. The claim is an obvious variant of the co-pending application’s claim 5.
In regards to claim 8-10, the claims are directed towards broadcast networks. The co-pending application utilizes the same broadcasting network as in claim 8 of this application. Examiner notes that a broadcast network is a network that broadcasts. With claims 9 and 10 specifying embodiments of what is being broadcasted, this is taught by Lie: [ (Col. 90, Lines 38-54) “Activations are broadcasted into the layer along the horizontal axis. Activations are received by the PEs and trigger a lookup of the associated weights that are stored local to the PEs (corresponding to the neurons mapped to the PEs). Only non-zero activations are broadcasted” ] This teaches the input activations being broadcasted to the processing elements (equivalent to the cores) and the input activations being a part of the data tensor as in claim 9. 
In regards to claim 11-13 the claims are similar to 8-10 aside from being directed towards a multicast network rather than a broadcast network. The same arguments, reasoning and rationale are applied for claims 11-13.
Claim 14 recites the grid topology of this application containing a row/column bus. This is an obvious variation of the co-pending application which also recites the grid topology but is silent to the bus. Huang teaches the bus as seen here: [ (Fig. 5), (¶0073) and (¶0074) ] The above figure and paragraphs teach the processing engine from Huang and disclose the array of processing engine in which each individual processing engine is connected to memory banks by the row/column and receive their data from the respective memory bank. This is similar to applicant’s Fig. 5 from the specification of the current application which depicts the row bus. 
Claim 15 recites the grid topology and further specifies a row/column tree. This is an obvious variation of the co-pending application which also recites the grid topology but is silent to the tree. Lie teaches the tree as seen here: [ (Fig. 4) and (Col. 64, Lines 29-37) “In some embodiments and/or usage scenarios, any one of PE0 1820, PE1 1821, PE2 1822, and PE3 1823 correspond to PE 497 of FIG. 4. In some embodiments and/or usage scenarios, any one or more of coupling between adjacent PEs 2041, 2042, 2043, and 2044 and/or portion of coupling between adjacent PEs 2050, 2051, 2052, 2053, 2054, 2055, 2056, and 2057 correspond to any one or more of North coupling 430, East coupling 431, South coupling 432, and West coupling 433 of FIG. 4” ] This figure and respective citation teach the grid of PEs being lined up and their connections between the PEs for data transfer with each PE being able to transfer data to one another, equivalent to applicant’s Fig. 6. of this application which discloses the tree. 
Claim 16 recites the grid topology and further specifies a systolic row/column shifter. This is an obvious variation of the co-pending application which also recites the grid topology but is silent to the systolic shifter. Lie teaches the systolic shift as seen here: [ (Fig. 26B) and (Col. 82, Lines 64-67) “FIG. 26B illustrates an embodiment of a pipeline flow for MBGD. A plurality of activations are processed with identical weights. Coordinated quiet times are used to synchronize weight updates” ] This citation teaches the synchronized weight updates which is equivalent to applicant’s systolic row/column shifter which details delays when updating the parameters to ensure that all the processing elements (cores) are updated in a synchronized manner (¶0058). 
Claim 20 is directed towards the same grid topology but further specifies it being in the form of a mesh. This is similar to the claim language of claim 14 in the co-pending application where the grid is disclosed but is silent to the mesh. Lie teaches the mesh as seen here: [ (Abstract) “An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has processing resources and memory resources. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh” ] This shows the array (two dimensional, with rows and columns) of processing elements being formed into a mesh. 
Claim 21 of this application is directed towards the mesh grid from above and further specifies that there is the ability for the data to communicate between the cores in a cardinal direction. This obvious variation of the grid topology which is taught in the co-pending application is taught by Lie as seen here: [ (Fig. 5) and (Col. 45, Lines 49-54) “FIG. 5 illustrates selected details of an embodiment of a PE as PE 500 of a deep learning accelerator. PE 500 comprises Router 510 and Compute Element 520. Router 510 selectively and/or conditionally communicates wavelets between other PEs (e.g., logically adjacent and/or physically adjacent PEs)” ] This figure shows that the router which is within the grid/mesh and directs the data has all four cardinal directions attached to it. 





Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.


Claim(s) 1-22 are rejected under 35 U.S.C. 103 as being unpatentable over Lie (US 10699189 B2), and further in view of Huang (US 20190180170 A1)

In regards to claim 1, Lie teaches the following:
each of the plurality of neural cores adapted to apply a plurality of synaptic weights to a plurality of input activations to produce a plurality of output activations;
[ (Col. 33, Lines 60-65) “The system of EC114, wherein for each layer of the neural network, incoming activations are weighted to create partial sums that are accumulated to generate output activations for the layer, and the accumulated weighted partial sums represent the neurons and associated synapses of the neural network”
	This citation from Lie teaches the weights being applied in input activations and the outputs being partial sums that are then used to create output activations. Examiner notes that the neural cores of the claim limitation are explicitly taught by Huang below but equivalent support is found in Lie with the PEs and FPGAs that are found in Fig. 1 (reference numbers 122 and 121 respectively). ]
 	at least one network interconnecting the plurality of neural cores, the at least one network adapted to simultaneously deliver synaptic weights and/or input activations to the plurality of neural cores.
[ (Col. 82, Lines 64-67) “FIG. 26B illustrates an embodiment of a pipeline flow for MBGD. A plurality of activations are processed with identical weights. Coordinated quiet times are used to synchronize weight updates”
	This citation teaches the synchronization of weight updates (equivalent to delivering weights simultaneously) for MBGD (mini-batch gradient descent). Examiner notes that these are delivered to the PEs (processing elements) or FPGAs as would be equivalent to the neural cores. ]
	What is not distinctly disclosed by Lie and is instead taught by Huang is seen below:
A neural inference chip comprising: a plurality of neural cores, (emphasis added)
[ (¶0048)
	In general, this paragraph from Huang talks about the processing unit being used and discusses the CPU (processing chip) being used for neural network inference which examiner notes makes it equivalent to a neural inference chip. The paragraph also discusses the use of cores on the chip in the process of inference. ]
	Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a system for accelerated deep learning as taught by Lie with the multi-memory on-chip network as taught by Huang. The reason it would be obvious is one of ordinary skill in the art would recognize, prior to the effective filing date, that combining the two would provide optimized parallel computations which are highly desirable for neural networks [ Huang (¶0048) ]. This would facilitate the recognized benefit of providing quicker and more efficient neural networks.




In regards to claim 2, The neural inference chip of claim 1, is taught by Lie/Huang in the rejection for claim 1 above. Lie continues teaching the following:
wherein the plurality of cores is organized in a grid of two or more dimensions with at least one row and at least one column.
[ (Col. 44, Lines 57-58) “In some embodiments and/or usage scenarios, the PEs are physically implemented in a 2D grid. In some embodiments and/or usage scenarios, the PEs are physically implemented in a 2D grid of aligned rectangles”
	Examiner notes that the PEs are equivalent to the cores in this reference, and that a physical implementation of a rectangle comprising of multiple PE would include a row and column. ]




In regards to claim 3, The neural inference chip of claim 1, is taught by Lie/Huang in the rejection for claim 1 above. Lie continues teaching the following:
wherein the at least one network comprises a plurality of branches arranged in a tree network topology,
[ (Col. 15 Line 65 – Col. 16 Line 6) “An example fabric is a collection of logical and/or physical couplings between processing elements and/or within a single processing element. The fabric is usable to implement logical and/or physical communication topologies such as a mesh, a 2D mesh, a 3D mesh, a hypercube, a torus, a ring, a tree, or any combination thereof”
	This citation teaches the tree topology. ]
 	and buffers disposed at each of the plurality of branches.
[ (Col. 15 Line 65 – Col. 16 Line 6) “An example of a physical coupling between processing elements is a set of physical interconnects (comprising optional and/or selective buffering) between physically-coupled processing elements”
	This citation teaches the processing elements (PEs) having optional buffers on each. Examiner notes that there is no specific definition for branches found in the applicant’s specification and the plain term meaning would just be the cores or processing elements that make up the “tree” topology. As such, the tree topology containing the processing elements teaches the branches. ]




In regards to claim 4, The neural inference chip of claim 1, is taught by Lie/Huang in the rejection for claim 1 above. Lie continues teaching the following:
wherein the at least one network comprises a plurality of branches arranged in a tree network topology, 
[ (Col. 15 Line 65 – Col. 16 Line 6) “An example fabric is a collection of logical and/or physical couplings between processing elements and/or within a single processing element. The fabric is usable to implement logical and/or physical communication topologies such as a mesh, a 2D mesh, a 3D mesh, a hypercube, a torus, a ring, a tree, or any combination thereof”
	This citation teaches the tree topology. ]
and routing controls disposed at each of the plurality of branches.
[ (Col. 16, Lines 23-25) “An example of physical coupling within a single processing element (having, e.g., a compute element and a router)”
	This citation teaches the PE (equivalent to the Core) having a routing element which is equivalent to the routing controls of the claim limitation. Examiner notes that there is no specific definition for branches found in the applicant’s specification and the plain term meaning would just be the cores or processing elements that make up the “tree” topology. As such, the tree topology containing the processing elements teaches the branches. ]




In regards to claim 5, The neural inference chip of claim 1, is taught by Lie/Huang in the rejection for claim 1 above. Lie continues teaching the following:
wherein the at least one network is adapted to deliver intermediate results among the plurality of cores.
[ (Fig. 12) and (Col. 52, Lines 61-65) “FIG. 12 illustrates selected details of an embodiment of flow associated with activation accumulation and closeout, followed by partial sum computation and closeout as Activation Accumulation/Closeout and Partial Sum Computation/Closeout 1200”
	This citation and the corresponding figure show the process where the network will deliver partial sums (falls within an intermediate result) to the processing elements (equivalent to cores). ]




In regards to claim 6, The neural inference chip of claim 5, is taught by Lie/Huang in the rejection for claim 5 above. Lie continues teaching the following:
wherein the intermediate results comprise partial sums
[ (Fig. 12) and (Col. 52, Lines 61-65) “FIG. 12 illustrates selected details of an embodiment of flow associated with activation accumulation and closeout, followed by partial sum computation and closeout as Activation Accumulation/Closeout and Partial Sum Computation/Closeout 1200”
	This citation and the corresponding figure show the process where the network will deliver partial sums (falls within an intermediate result) to the processing elements (equivalent to cores). ]



In regards to claim 7, The neural inference chip of claim 6, is taught by Lie/Huang in the rejection for claim 6 above. Lie continues teaching the following:
wherein the partial sums comprise weighted sums of a subset of inputs.
[ (Col. 22, Lines 15-20) “wherein for each layer of the neural network, incoming activations are weighted to create partial sums that are accumulated to generate output activations for the layer, and the accumulated weighted partial sums represent the neurons and associated synapses of the neural network”
	This citation teaches the weighted activation inputs are utilized to create the partial sums which is equivalent to the claim language above. ]




In regards to claim 8, The neural inference chip of claim 1, is taught by Lie/Huang in the rejection for claim 1 above. Lie continues teaching the following:
wherein the at least one network comprises a broadcast network.
[ (Col. 90, Lines 38-54) “Conceptually, processing proceeds as follows (see Forward 401 of FIG. 4). Activations are broadcasted into the layer along the horizontal axis”… “After the partial sums are accumulated producing a final sum, the activation function is performed and all new non-zero activations are broadcast to the next layer”
	Examiner notes that applicant’s specification contains no specific definition for broadcast network but does give examples of “broadcast” in the context of the claimed invention. Specifically, (¶0049) of the specification recites: “A row/column tree network may be configured for broadcast, where multiple cores need the same message (at the same or different times). In such embodiments, parameters are sent serially or in parallel on a tree that forks (for example by 2 or 4) at each level, broadcasting to a whole row simultaneously; for a NxN grid of cores N different messages can be sent simultaneously, 1 per row” this definition and the similar examples in applicant’s specification provide support that the broadcast network enables the passing of data to multiple cores (or processing elements). The citation above teaches the same with each processing element receiving the broadcasted data. ]




In regards to claim 9, The neural inference chip of claim 1, is taught by Lie/Huang in the rejection for claim 1 above. Lie continues teaching the following:
wherein the broadcast network is adapted to deliver a data tensor or block to all cores coupled to the broadcast network
[ (Col. 90, Lines 38-54) “Each PE performs a local multiply and accumulate of the incoming activation with all the neuron weights producing local partial sums. Since the weights of each neuron are distributed to multiple PEs, partial sums are then accumulated across the PEs in the vertical direction, in accordance with the neuron weight distribution”… “After the partial sums are accumulated producing a final sum, the activation function is performed and all new non-zero activations are broadcast to the next layer”
	This citation and the surrounding paragraphs teach the PEs having the data broadcasted and it reaching them in a systematic manner (first along the horizontal axis then the vertical axis) to make sure that each PE within the layer receives the broadcasted data. ]




In regards to claim 10, The neural inference chip of claim 9, is taught by Lie/Huang in the rejection for claim 9 above. Lie continues teaching the following:
wherein the data tensor or block comprises neural network input activations, intermediate activations, and/or parameters.
[ (Col. 90, Lines 38-54) “Activations are broadcasted into the layer along the horizontal axis. Activations are received by the PEs and trigger a lookup of the associated weights that are stored local to the PEs (corresponding to the neurons mapped to the PEs). Only non-zero activations are broadcasted”
	This teaches the input activations being broadcasted as a possible embodiment. ]




In regards to claim 11, The neural inference chip of claim 1, is taught by Lie/Huang in the rejection for claim 1 above. Lie continues teaching the following:
wherein the at least one network comprises a multicast network.
[ (Col. 19, Lines 4-6) “The system of EC100, wherein the virtual channel specifier selects routing paths in the fabric to perform multicast”
	This citation teaches the ability for the system to perform multicast. Examiner notes that applicant’s specification does not contain a specific definition for the term “multicast network” and as such, the cited reference’s network being able to perform multicast is equivalent to it being a multicast network. ]




In regards to claim 12, The neural inference chip of claim 11, is taught by Lie/Huang in the rejection for claim 11 above. Lie continues teaching the following:
wherein the multicast network is adapted to deliver a data tensor or block to a subset of cores coupled to the broadcast network.
[ (Col 18. Lines 30-41) “EC100) A system comprising: a fabric of processor elements, each processor element comprising a fabric router and a compute engine; wherein each processor element selectively communicates fabric packets with others of the processor elements; and wherein each compute engine selectively performs dataflow processing and instruction processing respectively in accordance with a dataflow field and an instruction field of each fabric packet the compute engine receives” (emphasis added)
	This citation which is the basis for the embodiment of EC101B as cited in the rejection for claim 11 above, teaches the dataflow being delivered to all the processing elements via the fabric router. ]




In regards to claim 13, The neural inference chip of claim 12, is taught by Lie/Huang in the rejection for claim 12 above. Lie continues teaching the following:
wherein the data tensor or block comprises neural network input activations, intermediate activations, and/or parameters
[ (Col. 19, Line 62 – Col. 20, Line 24), (Col. 20, Lines 46-53), and (Col. 22, Lines 15-20)
	These citations include EC130 (which relies on EC114 which relies on EC 113 which relies on EC100) and EC130 explicitly teaches the use of incoming activations (equivalent to input activations) being used in each layer of the neural network. Examiner notes that per the reference, the multiple example combinations listed are not exclusive to one another and can be combined. Therefore, the multicast embodiment along with the following chain leading to EC130 can be utilized together, teaching the entirety of the claim limitation. ]





In regards to claim 14, The neural inference chip of claim 2, is taught by Lie/Huang in the rejection for claim 2 above. Huang continues teaching the following:
wherein the at least one network comprises a row or column bus.
[ (Fig. 5), (¶0073) and (¶0074)
The above figure and paragraphs teach the processing engine from Huang and disclose the array of processing engine in which each individual processing engine is connected to memory banks by the row/column and receive their data from the respective memory bank. This is similar to applicant’s Fig. 5 from the specification which depicts the row bus. ]
Please refer to the rejection of claim 1 for the motivation to combine.




In regards to claim 15, The neural inference chip of claim 2, is taught by Lie/Huang in the rejection for claim 2 above. Lie continues teaching the following:
wherein the at least one network comprises a row or column tree.
[ (Fig. 4) and (Col. 64, Lines 29-37) “In some embodiments and/or usage scenarios, any one of PE0 1820, PE1 1821, PE2 1822, and PE3 1823 correspond to PE 497 of FIG. 4. In some embodiments and/or usage scenarios, any one or more of coupling between adjacent PEs 2041, 2042, 2043, and 2044 and/or portion of coupling between adjacent PEs 2050, 2051, 2052, 2053, 2054, 2055, 2056, and 2057 correspond to any one or more of North coupling 430, East coupling 431, South coupling 432, and West coupling 433 of FIG. 4”
	This figure and respective citation teach the grid of PEs being lined up and their connections between the PEs for data transfer with each PE being able to transfer data to one another, equivalent to applicant’s Fig. 6. ]




In regards to claim 16, The neural inference chip of claim 2, is taught by Lie/Huang in the rejection for claim 2 above. Lie continues teaching the following:
wherein the at least one network comprises a systolic row or column shifter.
[ (Fig. 26B) and (Col. 82, Lines 64-67) “FIG. 26B illustrates an embodiment of a pipeline flow for MBGD. A plurality of activations are processed with identical weights. Coordinated quiet times are used to synchronize weight updates”
	This citation teaches the synchronized weight updates which is equivalent to applicant’s systolic row/column shifter which details delays when updating the parameters to ensure that all the processing elements (cores) are updated in a synchronized manner (¶0058). Examiner notes that Lie also teaches the other pipeline data passes through Fig. 26A and 26C. ]




In regards to claim 17, The neural inference chip of claim 16, is taught by Lie/Huang in the rejection for claim 16 above. Lie continues teaching the following:
wherein the systolic row or column shifter is adapted to deliver a data tensor or block sequentially to cores coupled to the systolic row or column shifter.
[ (Fig. 26B) and (Col. 82, Lines 64-67) “FIG. 26B illustrates an embodiment of a pipeline flow for MBGD. A plurality of activations are processed with identical weights. Coordinated quiet times are used to synchronize weight updates”
	This citation teaches the column shifter passing data to the processing elements (equivalent to the cores) ]




In regards to claim 18, The neural inference chip of claim 17, is taught by Lie/Huang in the rejection for claim 17 above. Lie continues teaching the following:
wherein the data tensor or block comprises neural network input activations, intermediate activations, and/or parameters.
[ (Fig. 26B) and (Col. 82, Lines 64-67) “FIG. 26B illustrates an embodiment of a pipeline flow for MBGD. A plurality of activations are processed with identical weights. Coordinated quiet times are used to synchronize weight updates”
	This citation teaches the weight updates which can be considered parameters. ]




In regards to claim 19, The neural inference chip of claim 16, is taught by Lie/Huang in the rejection for claim 16 above. Lie continues teaching the following:
wherein the at least one network comprises at least two systolic row or column shifters configured to send data from opposite directions in the at least one network.
[ (Fig. 5) and (Col. 45, Lines 49-54) “FIG. 5 illustrates selected details of an embodiment of a PE as PE 500 of a deep learning accelerator. PE 500 comprises Router 510 and Compute Element 520. Router 510 selectively and/or conditionally communicates wavelets between other PEs (e.g., logically adjacent and/or physically adjacent PEs)”
This citation and the diagram show that the routers located on the PEs contain a multitude of directions available to them (all of the cardinal directions) and as previously stated in the previous rejection(s) the system is able to both: perform systolic row/column shifting (synchronizing the updates) and perform multiple updates for layers/rows/columns at once. When combined the teachings of the various citations combine to show that there can be at least two different systolic row or column updates in different directions. ]




In regards to claim 20, The neural inference chip of claim 2, is taught by Lie/Huang in the rejection for claim 2 above. Lie continues teaching the following:
wherein the grid comprises a mesh.
[ (Abstract) “An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has processing resources and memory resources. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh”
	This shows the array (two dimensional, with rows and columns) of processing elements being formed into a mesh. ]




In regards to claim 21, The neural inference chip of claim 20, is taught by Lie/Huang in the rejection for claim 20 above. Lie continues teaching the following:
wherein the mesh is adapted to communicate data between cores in cardinal directions.
[ (Fig. 5) and (Col. 45, Lines 49-54) “FIG. 5 illustrates selected details of an embodiment of a PE as PE 500 of a deep learning accelerator. PE 500 comprises Router 510 and Compute Element 520. Router 510 selectively and/or conditionally communicates wavelets between other PEs (e.g., logically adjacent and/or physically adjacent PEs)”
	This figure shows that the router which is within the grid/mesh and directs the data has all four cardinal directions attached to it. ]




In regards to claim 22, Lie teaches the following:
simultaneously delivering synaptic weights and/or input activations to a plurality of neural cores of the neural inference chip;
[ (Col. 82, Lines 64-67) “FIG. 26B illustrates an embodiment of a pipeline flow for MBGD. A plurality of activations are processed with identical weights. Coordinated quiet times are used to synchronize weight updates”
	This citation teaches the synchronization of weight updates (equivalent to delivering weights simultaneously) for MBGD (mini-batch gradient descent). Examiner notes that these are delivered to the PEs (processing elements) or FPGAs as would be equivalent to the neural cores. ]
 by each of the plurality of neural cores, applying a plurality of synaptic weights to a plurality of input activations to produce a plurality of output activations, sending the output activations via the at least one network.
[ (Col. 33, Lines 60-65) “The system of EC114, wherein for each layer of the neural network, incoming activations are weighted to create partial sums that are accumulated to generate output activations for the layer, and the accumulated weighted partial sums represent the neurons and associated synapses of the neural network”
	This citation from Lie teaches the weights being applied in input activations and the outputs being partial sums that are then used to create output activations. Examiner notes that the neural cores of the claim limitation are explicitly taught by Huang below but equivalent support is found in Lie with the PEs and FPGAs that are found in Fig. 1 (reference numbers 122 and 121 respectively). ]
	What Lie does not distinctly disclose and is instead taught by Huang is seen below:
A method comprising: by at least one network of a neural inference chip,
[ (¶0048)
	In general, this paragraph from Huang talks about the processing unit being used and discusses the CPU (processing chip) being used for neural network inference which examiner notes makes it equivalent to a neural inference chip. The paragraph also discusses the use of cores on the chip in the process of inference. ]
	Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a system for accelerated deep learning as taught by Lie with the multi-memory on-chip network as taught by Huang. The reason it would be obvious is one of ordinary skill in the art would recognize, prior to the effective filing date, that combining the two would provide optimized parallel computations which are highly desirable for neural networks [ Huang (¶0048) ]. This would facilitate the recognized benefit of providing quicker and more efficient neural networks.





Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US 20190385046 A1 – parallel computational architecture with reconfigurable core-level and vector-level parallelism which teaches chip architecture, synaptic weights, input activations and dimensionality of cores. 



Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL MERABI whose telephone number is (571)272-9685. The examiner can normally be reached Mon-Fri 7:30am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on (571) 270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/M.A.M./Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123