DETAILED ACTION
This is the response to applicant’s amendment action regarding application number 16/033,926, filed July 12, 2018.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 

Response to Amendments
The amendment filed March 15, 2022 has been entered. Examiner acknowledges receipt of Amendments to Application 16/033,926, which include: Amendments to the Claims, Amendments to the Specification, and Remarks containing Applicant’s amendments.
Regarding Applicant’s Remarks and Amendments to the Claims, Examiner acknowledges Claims 1 and 11-16 have been amended, with Claims 4 and 18 cancelled, and new Claims 26-27 added. Claims 1-3, 5-17, and 19-27 remain pending in the application.
Regarding Applicant’s Remarks and Amendments to the Specification, Examiner acknowledges the correction to the typographical error in paragraph [0018], and therefore the specification objection previously set forth in the Non-Final Office Action mailed November 15, 2021 is withdrawn.
Regarding Applicant’s Remarks and Amendments to the Claims, Examiner acknowledges the corrections to replace the term “system” with the term “neural inference chip” in the preamble of Claims 11-15 to resolve the identified 112(b) lack of antecedent issues in those claims, and therefore the respective 112(b) claim rejections previously set forth in the Non-Final Office Action mailed November 15, 2021 is withdrawn. 
Response to Arguments
Examiner acknowledges receipt of Arguments to Application 15/851,142, which include: Remarks containing Applicant’s arguments. 
Regarding Applicant’s Remarks for Claims 1-14 and 16-25 under 35 U.S.C. 102 as being anticipated by Dally et al., U.S. PGPUB 2018/0046906, published 2/15/2018 [hereafter referred as Dally]; and for Claim 15 under 35 U.S.C. 103 as being unpatentable over Dally in view of Henry et al., U.S. PGPUB 2018/0189651, published 7/5/2018 [hereafter referred as Henry], Examiner acknowledges Applicant’s arguments and have considered them, and have found them to be not persuasive. Examiner acknowledges Applicant’s arguments and has considered them, but they are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Examiner’s analysis of the claims with respect to the updated art references and corresponding claim mappings are provided in the sections indicated below. Examiner also notes several assertions and arguments made by the Applicant, which will be addressed in the following paragraphs. 
Regarding Applicant’s Remarks:
“[R]ejections under 35 U.S.C. §102 are proper only when the claimed subject matter is identically disclosed or described in 'the prior art.'" In re Arkley, 455 F.2d 586, 587 (C.C.P.A.1972) (emphasis in original). For a proper 35 U.S.C. §102 rejection, the prior art "reference must clearly and unequivocally disclose the claimed [invention] or direct those skilled in the art to the [invention] without any need for picking, choosing, and combining various disclosures not directly related to each other by the teachings of the cited reference." Id. (emphasis in original).”
Examiner has considered this argument and finds the argument to be not persuasive. Applicant appears to assert in the above argument that the 102 prior art rejection for Claims 1-14 and 16-25 in the Non-Final Office Action mailed November 15, 2021 was based on applying multiple prior art references and thus represents an improper rejection. According to the Non-Final Office Action mailed November 15, 2021, Examiner points out that the applied 35 U.S.C. 102(a)(1) rejection for Claims 1-14 and 16-25 is based on a single prior art reference (Dally et al., U.S. PGPUB 2018/0046906, published 2/15/2018), and as such, the single prior art reference does not exhibit “picking, choosing, and combining various disclosures not directly related to each other” as asserted by the Applicant’s above arguments. Examiner also notes that Applicant does not provide any further evidence where the picking, choosing, and combining of various disclosures was performed with respect to the claim mapping used for the 102(a)(1) prior art rejection in the Non-Final Office Action. Hence, Applicant’s above argument is not persuasive, and the earlier prior art rejection is maintained.
Regarding Applicant’s Remarks:
“In relying upon the theory of inherency, the examiner must provide a basis in fact and/or technical reasoning to reasonably support the determination that the allegedly inherent characteristic necessarily flows from the teachings of the applied prior art." Ex parte Levy, 17 USPQ.2d 1461, 1464 (B.P.A.I. 1990) (emphasis in original). Inherency may not be established by probabilities or possibilities, and the mere fact that a certain result "may" follow from a given set of circumstances is not sufficient. MEHL/Biophile Int 'l Corp. v. Milgraum, 192 F .3d 1362, 1365 (Fed. Cir. 1999); In re Oelrich, 666 F.2d 578, 581 (C.C.P.A. 1981).”
Examiner has considered this argument and finds the argument to be not persuasive. Examiner notes that Applicant’s above assertion of inherency is not supported with any further evidence or identification of claim limitations describing where inherency was improperly applied during examination. According to the Non-Final Office Action mailed November 15, 2021, Examiner points out that each limitation in each claim were properly addressed with the proper paragraph or section citation to identify the corresponding functionality or steps from the prior art reference that teach the recited claim limitation (the details of which are left to the Non-Final Office Action, and will not be re-stated here). Hence, given the fact that the Applicant did not provide any evidence to support an improper use of inherency, Applicant’s above argument is not persuasive, and the earlier prior art rejection is maintained.
As indicated earlier, Examiner notes that the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Examiner’s analysis of the claims with respect to the updated art references and corresponding claim mappings are provided in the sections indicated below.

Claim Objections
Claims 1, 5, 11, 16, and 19 are objected to 
because of the following informalities:
Claim 1: A conjunction word is missing between the following two limitations recited in this claim:
“… each of the plurality of neural cores is configured to compute in parallel output activations by applying its plurality of vector compute units to input activations; and
each of the plurality of neural cores is assigned a subset of output activations of a layer of a neural network for computation, wherein the plurality of vector compute units each comprise multiplication and addition units.”. Appropriate correction is required.
Claims 5, 11: The term “the vector compute units” should be corrected as “the plurality of vector compute units” to correspond with the term “a plurality of vector compute units” recited in parent independent Claim 1. Appropriate correction is required.
Claim 11: The limitation “accumulating the partial sum” should be corrected as “accumulate the partial sum” to be consistent with the verb tense used in the preceding limitations. Appropriate correction is required.
Claim 16: A conjunction word is missing between the following two limitations recited in this claim:
“… assigning each of the plurality of neural cores a subset of output activations of a layer of a neural network for computation; and
upon receipt of a subset of input activations of the layer of the layer of the neural network, each of the plurality of neural cores …”. Appropriate correction is required.
Claim 19: The term “the vector compute units” should be corrected as “the plurality of vector compute units” to correspond with the term “a plurality of vector compute units” recited in parent independent Claim 16. Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are:
Claim 1: “a plurality of vector compute units configured to operate in parallel …”; and “applying its plurality of vector compute units …”. The term “its” is defined by the Merriam-Webster dictionary as signifying one element relating to “another” element. In the context of the claims, this “another” element is a neural core, and hence under its broadest reasonable interpretation, the vector compute units are interpreted as units relating to (i.e., associated with) a neural core, and thus are not required as being part of the structure of the neural inference chip (which comprises a plurality of neural cores).
Claim 11: “the [plurality of ] vector compute units are configured to …”
Claim 12: “the plurality of vector compute units are configured to compute …”
Claim 16: “the plurality of vector compute units configured to operate in parallel …”
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:

The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 13-15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite 
for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding original Claim 13,
Claim 13 recites the term “said computation” in the limitation “wherein said computation by each of the plurality of neural cores is pipelined”, which renders the claim as being indefinite, as there appears to be two types of computation actions recited in parent independent Claim 1 (a computation action that applies the plurality of vector compute units to input activations, and another computation action involving a subset of output activations), and hence it is not clear which one of the above two computation actions recited in parent independent Claim 1 is being pipelined. For the purposes of examination, the term “said computation” will be interpreted as referring to the computation action that applies the plurality of vector compute units to input activations, such that the limitation broadly recites a mechanism where the computation operations are performed in a sequential (i.e., pipelined) fashion.
Claims 14-15 are dependent claims tracing back to parent dependent Claim 13, and as such inherits the same indefiniteness established in Claim 13. Hence, Claims 14-15 are also rejected as being indefinite by virtue of dependency.
Regarding original Claim 14,
Claim 14 recites the term “each stage of said computation” in the limitation “wherein each of the plurality of neural cores is configured to concurrently perform each stage of said computation”. There is insufficient antecedent basis for this term, as parent independent Claim 1 does not explicitly recite any of the recited computations as being separated into multiple stages of computations. In addition, the recitation of this term also renders the claim as being indefinite, as it is not clear which one of the two computation actions recited in independent Claim 1 (the computation action that applies the plurality of vector compute units to input activations, or the computation action involving a subset of output activations) contains multiple stages of computations that are concurrently performed. For the purposes of examination, the term “each stage of said computation” will be interpreted as referring to the computation action that applies the plurality of vector compute units to input activations, where the term “concurrently perform” broadly recites performing parallel computations.
Claim 15 is a dependent claim tracing back to parent dependent Claim 14, and as such inherits the same indefiniteness established in Claim 14. Hence, Claim 15 is also rejected as being indefinite by virtue of dependency.
Regarding original Claim 15,
Claim 15 recites the term “said computation” in the limitation “wherein said computation maintains parallelism”, which renders the claim as being indefinite, as there appears to be two types of computation actions recited in parent independent Claim 1 (a computation action that applies the plurality of vector compute units to input activations, and another computation action involving a subset of output activations), and hence it is not clear which one of the above two computation actions recited in parent independent Claim 1 “maintains” or exhibits parallelism. For the purposes of examination, the term “said computation” will be interpreted as referring to the computation action that applies the plurality of vector compute units to input activations, such that the limitation broadly recites that these computation actions performed within the vector compute units exhibit parallelism.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-2, 5-16, and 19-27 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by 
 Temam et al., U.S. Patent 9,710,265, published 7/18/2017 [hereafter referred as Temam].
Regarding amended Claim 1, 
Temam teaches
(Currently Amended) A neural inference chip, comprising: 
a plurality of neural cores, each of the plurality of neural cores comprising a plurality of vector compute units configured to operate in parallel (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s specification paragraph [0054], the term “a plurality of vector compute units” broadly recite elements or logical functions performing vector computations. Thus, under its broadest reasonable interpretation, this limitation broadly recites a plurality of neural cores (which is interpreted as a plurality of elements that perform neural network computations), where each neural core is associated with a plurality of elements/logical functions performing vector computations in a parallel fashion. Temam teaches a neural network compute tile/computing unit in a computing system containing a plurality of compute tiles (corresponding to a plurality of neural cores), where each compute tile contains a plurality of cells in a MAC array. Temam further teaches each cell includes a multiply accumulate (MAC) operator and sum register that is used for executing tensor arithmetic computations that include dot product computations to generate partial sums, where these arithmetic dot product computations to generate partial sums consist of multiplication and addition/summation operations. Temam additionally teaches that the MAC array processes multi-dimensional data arrays based on a received SIMD instruction, such that these cells share the same instruction but operate on different data elements within the multi-dimensional data arrays (representing vectors) to perform the desired array computation within one clock cycle. Hence, each of these cells within the MAC array performing arithmetic dot product computations (requiring multiplication and addition operations) correspond to a vector compute unit, and the MAC array (representing a plurality of vector compute units) that processes these multi-dimensional data arrays based on the received SIMD instruction containing an arithmetic dot product operation corresponds to a plurality of vector compute units configured to operate in parallel to compute the dot product computations on vector data (Temam Figure 1, elements 112 and 114, col.9 lines 6-37; Figure 2, element 214 containing cell #0, #1, #2, #3 and col.9 lines 48-51: “… the example tile 200 may correspond to any of the tiles within the first tile set 112 and second tile set 114 discussed above with reference to FIG. 1 …”; col.11 lines 23-42: “Compute tile 200 further includes an input activation bus and a MAC array 214 including multiple cells that each include a MAC operator 215 and a sum register 220 … MAC array 214 executes, using MAC operators 215 and sum registers 220 across multiple cells, tensor computations that include arithmetic operations relating to dot product computations. … During arithmetic operations, partial sums may be accumulated and stored in a corresponding, e.g., sum register 220 …”; and Figure 2, col.12 lines 6-21: “With regard to dot product computations of, for example, two multi-dimensional data arrays, for a single compute tile 200, MAC array 214 provides robust single instruction multiple data (SIMD) functionality. SIMD generally means that all parallel units (multiple MAC operators 215) share the same instruction … but each MAC operator 215 executes the instruction on different data elements … adding the arrays [1,2,3,4] and [5,6,7,8] element-wise to obtain the array [6,8,10,12] in one cycle … By using SIMD, the four units can share the same instruction (e.g., “add”) and perform computations in parallel …”).), wherein: 
each of the plurality of neural cores is configured to compute in parallel output activations by applying its plurality of vector compute units to input activations (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites a process where each neural core computes output values (representing output activations) based on received input values (representing input activations), where the computation is performed in a parallel fashion using a plurality of vector compute units. As indicated earlier, Temam teaches a plurality of compute tiles (“plurality of neural cores”), where each of these compute tiles is attached to an input activation bus that receives input activation data from a controller. Temam further teaches that the MAC operators within each MAC array receives an input activation, with each cell performing dot product arithmetic operations with a parameter accessed from wide memory to produce partial sums that are accumulated into output activation values. As indicated earlier, Temam teaches the cells within the MAC array perform these arithmetic computations to generate the partial sums and output activation values in a parallel fashion according to the received SIMD instructions, thus corresponding to a process where each of the plurality of neural cores is configured to compute in parallel output activations by applying its plurality of vector compute units to input activations (Temam Figure 2, col.11 lines 23-64: “… Compute tile 200 further includes an input activation bus 216 and a MAC array 214 including multiple cells that each include a MAC operator 215 and a sum register 220 … Input activation bus 216 provides a data path in which input activations are provided … one-by-one for respective access by each MAC operator 215 of MAC array 214 … a single MAC operator 215 of a particular cell with each receive an input activation. Arithmetic operations performed by the MAC operators of the MAC array 214 generally include multiplying an input activation provided by narrow memory 210 with a parameter accessed from wide memory 212 to produce a single output activation value. … The tensor computations can be described as having a first portion and second portion … The first portion is complete when multiply operations produce an output activation … Compute tile 200 further includes an output activation bus 218, a non-linear unit (NLU) 222 comprising an output activation pipeline 224 … upon completion of the first portion of the tensor computation, output activations are provided from MAC array 214 to NLU 222 via output activation bus 218.”; Figure 2, col.12 lines 6-21; and Figure 4, col.14 lines 14-17: “… MAC cells 410, including MAC operators 215, can be defined as compute cells that calculate a partial sum …”).); 
each of the plurality of neural cores is assigned a subset of output activations of a layer of a neural network for computation (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites performing computations for adjacent neural network layers, where each neural core receives a subset of activations representing the output activations of a previous neural network layer. As indicated earlier, Temam teaches a plurality of compute tiles (“plurality of neural cores”), where each of these compute tiles is attached to an output activation bus that transports the results of multiplication operations performed by a previous compute tile. Temam further teaches each compute tile stores a subset of input activations needed to compute a subset of output activations that are assigned to a particular tile, where these subset of output activations performed at a compute layer are transferred to other compute tiles via the output activation bus (corresponding to the mesh bus that connects neighboring computing tiles) to compute output activations for a subsequent layer, and as such, this data flow of output activations (representing computations from a neural network layer) from one compute tile to another neighboring tile to further compute output activations for a subsequent neural network layer corresponds to a process where each of the plurality of neural cores is assigned a subset of output activations of a layer of a neural network for computation (Temam Figure 1, col.9 lines 6-37: “… controller 102 receives, from host interface 108, input activations, tile instructions, and model parameters (i.e., weights) for executing tensor computations for a given layer of a neural network … With regard to data flow, input activations and parameters are transmitted to tiles of tile sets 112, 114 via ring bus 128. Each of the tiles … will store a subset of the input activations needed to compute a subset of output activations that are assigned to that particular tile … Results of the one or more tensor computations include writing output activations of a compute layer to a narrow memory unit(s) of the tile performing the computation. … there will be a transfer of output edge activations to neighboring tiles via mesh bus 126. Transfer of output edge activations are required to compute output activations for a subsequent layer when computations span multiple layers.”).), 
wherein the plurality of vector compute units each comprise multiplication and addition units (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites each vector compute unit contains logical functions to perform both multiplication and addition operations, where these logical functions are referred to as “multiplication units” and “addition units”. As indicated earlier, Temam teaches each cell within a MAC array in a compute tile contains a MAC operator and a sum register, where the MAC operator and sum register within each cell are used to perform the arithmetic operations to generate the partial sums, which are accumulated and stored in the sum register, and as such, the MAC array containing these MAC operators and sum registers that perform multiplication operations to generate the partial sums and accumulation and summation operations to add these partial sums together correspond to logical functions representing multiplication and addition units, respectively (Temam Figure 2, element 214 containing cell #0, #1, #2, #3 and col.11 lines 23-42).).
Regarding original Claim 2, 
Temam teaches
(Original) The neural inference chip of claim 1, wherein: 
upon receipt of a subset of input activations of the layer of the neural network, each of the plurality of neural cores 
computes a partial sum for each of its assigned output activations (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites each neural core performing operations involving computing partial sums based on the received subset of input activations (where these input activations are the output activations of a previous neural network layer), where these partial sums are used for computing corresponding output activations. As indicated earlier, Temam teaches a plurality of compute tiles (“plurality of neural cores”), where multiplication operations are performed by the cells within the MAC array using the input activations and a parameter accessed from wide memory, such that these cells produce partial sums that are accumulated and either stored in a corresponding sum register or re-written into wide-memory (for re-access by a particular cell of MAC array 214), to complete follow-on multiply operations to compute the output activation result provided to the output activation bus. Hence, this process in which each compute tile, upon receiving a subset of input activations, uses the corresponding MAC array to perform dot product multiplication and addition/summation operations to compute partial sums representing output activation results, corresponds to a process in which each neural core, upon receiving a subset of input activations, computes a partial sum for each of its assigned output activations (Temam Figure 1, col.9 lines 6-37; and Figure 2, col.11 lines 23-64).), and 
computes its assigned output activations from at least the computed partial sums (Examiner’s note: Under its broadest interpretation, this limitation broadly recites each neural core performing operations involving computing output activations based on the computed partial sums. As indicated earlier, Temam teaches a plurality of compute tiles (“plurality of neural cores”), where each of these compute tiles is attached to an to an input activation bus that receives input activation data from a controller, and an output activation bus that transports the results of multiplication operations performed by a previous compute tile. As indicated earlier, Temam teaches these multiplication operations generate partial sums that are further used to compute the output activation result provided to the output activation bus, and as such, each compute tile performs operations that correspond to computing the output activations from the computed partial sums based upon receipt of a subset of input activations from a neural network layer (Temam Figure 1, col.9 lines 6-37; and Figure 2, col.11 lines 23-64).).
Regarding original Claim 5, 
Temam teaches
(Original) The neural inference chip of claim 1, wherein the vector compute units comprise accumulation units (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites each vector compute unit contains logical functions to perform accumulation operations, where this logical function is referred to as “accumulation units”. As indicated earlier, Temam teaches each cell within a MAC array in a compute tile contains a MAC operator and a sum register, where the MAC operator and sum register within each cell are used to perform the arithmetic operations to generate the partial sums, which are accumulated and stored in the sum register, and as such, the sum registers that perform accumulation and summation operations to add these partial sums together correspond to logical functions representing accumulation units (Temam Figure 2, element 214 containing cell #0, #1, #2, #3 and col.11 lines 23-42).).
Regarding original Claim 6, 
Temam teaches
(Original) The neural inference chip of claim 2, wherein the plurality of neural cores perform said partial sum computation in parallel (Examiner’s note: As indicated earlier, Temam teaches a plurality of compute tiles (“plurality of neural cores”), where each of these compute tiles contains a MAC array comprising of a plurality of cells, where the MAC operators within each MAC array receives an input activation, and each cell performs dot product arithmetic operations with a parameter accessed from wide memory to produce partial sums that are accumulated into output activation values. As indicated earlier, Temam teaches the cells within the MAC array perform these arithmetic computations to generate the partial sums and output activation values in a parallel fashion according to the received SIMD instructions, thus corresponding to a process where the plurality of neural cores performs the partial sum computation in parallel (Temam Figure 2, col.11 lines 23-64; and col.12 lines 6-21).).
Regarding original Claim 7, 
Temam teaches
(Original) The neural inference chip of claim 2, wherein the plurality of neural cores perform said output activation computation in parallel (Examiner’s note: As indicated earlier, Temam teaches a plurality of compute tiles (“plurality of neural cores”), where each of these compute tiles contains a MAC array comprising of a plurality of cells, where the MAC operators within each MAC array receives an input activation, and each cell performs dot product arithmetic operations with a parameter accessed from wide memory to produce partial sums that are accumulated into output activation values. As indicated earlier, Temam teaches the cells within the MAC array perform these arithmetic computations to generate the partial sums and output activation values in a parallel fashion according to the received SIMD instructions, thus corresponding to a process where the plurality of neural cores performs the output activation computation in parallel (Temam Figure 2, col.11 lines 23-64; and col.12 lines 6-21).).
Regarding original Claim 8, 
Temam teaches
(Original) The neural inference chip of claim 2, wherein computing the partial sum comprises applying at least one of the plurality of vector compute units to multiply the input activations and synaptic weights (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites a computing operation that involves at least one of the vector compute units applying input activations and weight parameters to compute the partial sum. As indicated earlier, Temam teaches a plurality of compute tiles (“plurality of neural cores”), where each of these compute tiles contains a MAC array comprising of a plurality of cells, where the MAC operators within each MAC array receives an input activation, and each cell performs dot product arithmetic operations with a parameter accessed from wide memory to produce partial sums that are accumulated into output activation values. Temam additionally teaches that this parameter accessed from wide memory are weight parameters, and as such, this dot product arithmetic operation performed using the MAC array that involves applying input activations and weight parameters to produce partial sums corresponds to a computing operation that involves at least one of the vector compute units applying input activations and weight parameters to compute the partial sum (Temam col.4 lines 2-5: “… Computation occurs when an input activation provided by a narrow memory structure is multiplied with a parameter or weight provided by a wide memory structure.”; Figure 2, col.11 lines 23-64; and col.12 lines 6-21).).
Regarding original Claim 9, 
Temam teaches
(Original) The neural inference chip of claim 2, wherein computing the assigned output activations comprises applying a plurality of addition units (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites the output activations are computed using logical functions performing addition operations, where this logical function is referred to as “addition units”. As indicated earlier, Temam teaches each cell within a MAC array in a compute tile contains a MAC operator and a sum register, where the MAC operator and sum register within each cell are used to perform the arithmetic operations to generate the partial sums, which are accumulated and stored in the sum register, and as such, the sum registers that perform accumulation and summation operations to add these partial sums together correspond to logical functions representing addition units (Temam Figure 2, element 214 containing cell #0, #1, #2, #3, col.11 lines 23-42; and Figure 4, col.14 lines 14-17). Temam further teaches another implementation involving two MAC operators, where the two MAC operators compute the multiplication of two activation values with two parameters from wide memory and perform addition of those two multiplier results with a current partial sum, where this cell configuration also correspond to logical functions performing addition operations (and thus representing addition units) (Temam col.14 lines 21-26: “… a dual issue cell refers to a cell with two MAC operators that can compute the multiplication of two activation values … with two parameters … and perform an addition between the results of the two multipliers and the current partial sum.”).).
Regarding original Claim 10, 
Temam teaches
(Original) The neural inference chip of claim 2, wherein computing output activations comprises applying a nonlinear function (Examiner’s note: As indicated earlier, Temam teaches the MAC operators within each MAC array receives an input activation, with each cell performing dot product arithmetic operations with a parameter accessed from wide memory to produce partial sums that are accumulated into output activation values. Temam further teaches that these output activation values are further applied to a non-linear function, where the resulting output of this non-linear function (also referred to as output activations) is written to narrow memory (Temam Figure 2, col.11 line 33-col.12 line 1: “… a single MAC operator 215 of a particular cell with each receive an input activation. Arithmetic operations performed by the MAC operators of the MAC array 214 generally include multiplying an input activation provided by narrow memory 210 with a parameter accessed from wide memory 212 to produce a single output activation value. … The tensor computations can be described as having a first portion and second portion … The first portion is complete when multiply operations produce an output activation. The second portion includes application of a non-linear function to an output activation and the second portion is complete when the output activation is written to narrow memory 210 after application of the function. Compute tile 200 further includes an output activation bus 218, a non-linear unit (NLU) 222 comprising an output activation pipeline 224 … upon completion of the first portion of the tensor computation, output activations are provided from MAC array 214 to NLU 222 via output activation bus 218. After arrival at NLU 222, data specifying an activation function … is applied to the output activations and the output activations are then written to narrow memory 210.”).).
Regarding amended Claim 11, 
Temam teaches
(Currently Amended) The system neural inference chip of claim 2, wherein the plurality of vector compute units are configured to: 
perform a plurality of multiply operations in parallel (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s specification paragraph [0054], the term “a plurality of vector compute units” broadly recite elements or logical functions performing vector computations. As indicated earlier, Temam teaches each cell in a MAC array includes a multiply accumulate (MAC) operator and sum register that is used for executing tensor arithmetic computations that include dot product computations to generate partial sums, where these arithmetic dot product computations to generate partial sums consist of multiplication and addition/summation operations. As indicated earlier, Temam additionally teaches that the MAC array processes multi-dimensional data arrays based on a received SIMD instruction (with the instruction being used to inform each cell to execute the same operation on each element in the array within one clock cycle), and hence the MAC array (“plurality of vector compute units”) that processes these multi-dimensional data arrays based on the received SIMD instruction containing an arithmetic dot product operation corresponds to a plurality of vector compute units configured to perform parallel multiply computations on vector data (Temam Figure 1, elements 112 and 114, col.9 lines 6-37; Figure 2, element 214 containing cell #0, #1, #2, #3 and col.9 lines 48-51; col.11 lines 23-42; and Figure 2, col.12 lines 6-21).); 
perform a plurality of additions in parallel (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s specification paragraph [0054], the term “a plurality of vector compute units” broadly recite elements or logical functions performing vector computations. As indicated earlier, Temam teaches each cell in a MAC array includes a multiply accumulate (MAC) operator and sum register that is used for executing tensor arithmetic computations that include dot product computations to generate partial sums, where these arithmetic dot product computations to generate partial sums consist of multiplication and addition/summation operations. As indicated earlier, Temam additionally teaches that the MAC array processes multi-dimensional data arrays based on a received SIMD instruction (with the instruction being used to inform each cell to execute the same operation on each element in the array within one clock cycle), and hence the MAC array (“plurality of vector compute units”) that processes these multi-dimensional data arrays based on the received SIMD instruction containing an arithmetic dot product operation corresponds to a plurality of vector compute units configured to perform parallel addition computations on vector data (Temam Figure 1, elements 112 and 114, col.9 lines 6-37; Figure 2, element 214 containing cell #0, #1, #2, #3 and col.9 lines 48-51; col.11 lines 23-42; and Figure 2, col.12 lines 6-21).); and 
accumulate the partial sum (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s specification paragraph [0054], the term “a plurality of vector compute units” broadly recite elements or logical functions performing vector computations. As indicated earlier, Temam teaches each cell within a MAC array (“plurality of vector compute units”) in a compute tile contains a MAC operator and a sum register, where the MAC operator and sum register within each cell are used to perform the arithmetic operations to generate the partial sums, which are accumulated and stored in the sum register, and as such, the sum registers within the MAC array that perform accumulation and summation operations to add these partial sums together correspond to a plurality of vector compute units configured to accumulate the partial sum (Temam Figure 2, element 214 containing cell #0, #1, #2, #3 and col.11 lines 23-42).).
Regarding amended Claim 12, 
Temam teaches
(Currently Amended) The system neural inference chip of claim 2, wherein the plurality of vector compute units are configured to compute partial sums in parallel (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s specification paragraph [0054], the term “a plurality of vector compute units” broadly recite elements or logical functions performing vector computations. As indicated earlier, Temam teaches each cell in a MAC array includes a multiply accumulate (MAC) operator and sum register that is used for executing tensor arithmetic computations that include dot product computations to generate partial sums, where these arithmetic dot product computations to generate partial sums consist of multiplication and addition/summation operations. As indicated earlier, Temam additionally teaches that the MAC array processes multi-dimensional data arrays based on a received SIMD instruction (with the instruction being used to inform each cell to execute the same operation on each element in the array within one clock cycle), and hence the MAC array (“plurality of vector compute units”) that processes these multi-dimensional data arrays based on the received SIMD instruction containing an arithmetic dot product operation corresponds to a plurality of vector compute units configured to perform parallel dot product computations on vector data to compute partial sums in parallel (Temam Figure 1, elements 112 and 114, col.9 lines 6-37; Figure 2, element 214 containing cell #0, #1, #2, #3 and col.9 lines 48-51; col.11 lines 23-42; Figure 2, col.12 lines 6-21).).
Regarding amended Claim 13, 
Temam teaches
(Currently Amended) The system neural inference chip of claim 1, wherein said computation by each of the plurality of neural cores is pipelined (Examiner’s note: As indicated earlier, the term “said computation” exhibits a 112(b) indefiniteness issue, and for the purposes of examination, this term will be interpreted as referring to the computation action that applies the plurality of vector compute units to input activations, such that the limitation broadly recites a mechanism where the computation operations are performed in a sequential (i.e., pipelined) fashion. Temam teaches the MAC array in each compute tile accesses memory relating to the operands of an instruction (e.g., weight parameter and accumulated partial sum data) in a time-sharing fashion such that read operations related to certain operands may be ordered before write operations (and vice versa), resulting in the pipeline associated with a particular operand located within the same type of memory may be stalled, and hence this mechanism that contains a pipeline for each operand to control the ordering of the read/write memory operations for the dot product computation corresponds to a mechanism where the computation by each of the plurality of neural cores is pipelined (Temam col.15 lines 43-62: “… MAC array 214 of compute tile 200 performs tensor computations comprising dot product computations based on elements of a data array structure accessed from memory. … The linear unit (LU) is thus a SIMD vector arithmetic logic unit (ALU) unit that receives data from a vector memory … MAC operators 215 may also get the accumulator inputs (partial sums) from wide memory 212 as well … there is time sharing relative to the wide memory 212 port for reads and/or writes relating to the two different operands (parameters and partial sum) … As a result, when there is a need to read an operand (e.g., a parameter) from wide memory 212 and write an operand (e.g., a partial sum) to wide memory 212 at the same time, a pipeline associated with a particular operand can be stalled.”).).
Regarding amended Claim 14, 
Temam teaches
(Currently Amended) The system neural inference chip of claim 13.
 wherein each of the plurality of neural cores is configured to concurrently perform each stage of said computation (Examiner’s note: As indicated earlier, the term “each stage of said computation” exhibits a 112(b) indefiniteness issue, and for the purposes of examination, this term will be interpreted as referring to the computation action that applies the plurality of vector compute units to input activations, where the term “concurrently perform” broadly recites performing parallel computations. Temam teaches that the computation performed within each compute tile (“plurality of neural cores”) begins with the input activation, parameters, and SIMD instructions being received at each compute tile to instruct the MAC operators to perform the arithmetic dot product operations, and the computation only ends when all dot product operations are computed and the pre-activation functions are applied to the computation results (producing the output activation results), where this process of receiving SIMD instructions outlining the begin/end sequence describing the sequential arithmetic dot product computation being performed by the compute tile and the MAC operators within the MAC array when all computations are computed corresponds to a process where each of the plurality of neural cores is configured to concurrently perform each stage of said computation (Temam col.12 lines 6-12: “With regard to dot product computations of, for example, two multi-dimensional data arrays, for a single compute tile 200, MAC array 214 provides robust single instruction multiple data (SIMD) functionality. SIMD generally means that all parallel units (multiple MAC operators 215) share the same instruction …”; and col.9 lines 21-29: “… Computation within a tile begins when required input activation, parameters/weights and computation instructions (TTU operations, memory addresses, etc.) are available in the tile. Computations occurring within a tile ends when MAC operators … within a tile complete all dot product operations defined by the instruction set and pre-activation functions are applied to the results (i.e., output activations) of the multiplication operations.”).).
Regarding amended Claim 15, 
Temam teaches
(Currently Amended) The system neural inference chip of claim 14, wherein said computation maintains parallelism (Examiner’s note: As indicated earlier, the term “said computation” exhibits a 112(b) indefiniteness issue, and for the purposes of examination, this term will be interpreted as referring to the computation action that applies the plurality of vector compute units to input activations, such that the limitation broadly recites that these computation actions performed within the vector compute units exhibit parallelism. As indicated earlier, Temam teaches partitioning of a single SIMD instruction to multiple compute tiles to perform simultaneous arithmetic dot product computations, where this partitioning to perform simultaneous arithmetic dot product computations across different neural cores and different MAC arrays corresponds to a process where each neural core exhibits parallelism with respect to the other MAC arrays in the other compute tiles (Temam col.12 lines 22-38: “… a single instruction can be provided by controller 102 to multiple compute tiles 200 (see tile sets 112, 114 of FIG. 1) for consumption by multiple MAC arrays 214. … neural network layers can include multiple output neurons and the output neurons can be partitioned such that tensor computations associated with a subset of output neurons can be assigned to a particular tile of tile sets 112, 114. Each tile of tile sets 112, 114 can then perform related tensor computations on different groups of neurons for a given layer. Compute tile 200 can therefore provide at least two forms of parallelism: 1) one form includes partition the output activations … amongst the multiple tiles … 2) another form includes simultaneous computation (with a single instruction) of multiple subsets of output neurons based on the partitioning amongst the tiles of tile sets 112, 114.”).).
Regarding amended Claim 16, 
Temam teaches
(Currently Amended) A method comprising: 
receiving at each of a plurality of neural cores a subset of input activations of a layer of a neural network (Examiner’s note: Temam teaches input activations and parameters are transmitted to the plurality of compute tiles (“plurality of neural cores”), where each compute tile stores a subset of the input activations needed to compute the subset of output activations assigned to that particular tile (Temam col.9 lines 6-19: “… controller 102 receives, from host interface 108, input activations, tile instructions, and model parameters (i.e., weights) for executing tensor computations for a given layer of a neural network … With regard to data flow, input activations and parameters are transmitted to tiles of tile sets 112, 114 via ring bus 128. Each of the tiles … will store a subset of the input activations needed to compute a subset of output activations that are assigned to that particular tile.”).), 
each of the plurality of neural cores comprising a plurality of vector compute units configured to operate in parallel (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s specification paragraph [0054], the term “a plurality of vector compute units” broadly recite elements or logical functions performing vector computations. Thus, under its broadest reasonable interpretation, this limitation broadly recites a plurality of neural cores (which is interpreted as a plurality of elements that perform neural network computations), where each neural core is associated with a plurality of elements/logical functions performing vector computations in a parallel fashion. As indicated earlier, Temam teaches a neural network compute tile/computing unit in a computing system containing a plurality of compute tiles (corresponding to a plurality of neural cores), where each compute tile contains a plurality of cells in a MAC array. Temam further teaches each cell includes a multiply accumulate (MAC) operator and sum register that is used for executing tensor arithmetic computations that include dot product computations to generate partial sums, where these arithmetic dot product computations to generate partial sums consist of multiplication and addition/summation operations. As indicated earlier, Temam additionally teaches that the MAC array processes multi-dimensional data arrays based on a received SIMD instruction (with the instruction being used to inform each cell to execute the same operation on each element in the array within one clock cycle), and hence the MAC array (“plurality of vector compute units”) that processes these multi-dimensional data arrays based on the received SIMD instruction containing an arithmetic dot product operation corresponds to a plurality of vector compute units configured to operate in parallel to compute the dot product computations on vector data (Temam Figure 1, elements 112 and 114, col.9 lines 6-37; Figure 2, element 214 containing cell #0, #1, #2, #3 and col.9 lines 48-51”; col.11 lines 23-42; and Figure 2, col.12 lines 6-21).), 
each of the plurality of neural cores configured to compute in parallel output activations by applying its plurality of vector multipliers to input activations (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites a process where each neural core computes output values (representing output activations) based on received input values (representing input activations), where the computation is performed in a parallel fashion using a plurality of vector multipliers (where the term “its plurality of vector multipliers” is interpreted as a set of elements/logical functions performing vector multiplication). As indicated earlier, Temam teaches a plurality of compute tiles (“plurality of neural cores”), where each of these compute tiles is attached to an input activation bus that receives input activation data from a controller. Temam further teaches that the MAC operators within each MAC array receives an input activation, with each cell performing dot product arithmetic operations with a parameter accessed from wide memory to produce partial sums that are accumulated into output activation values, and with the cells in the MAC array performing dot product arithmetic operations corresponding to a plurality of vector multipliers. As indicated earlier, Temam teaches the cells within the MAC array perform these arithmetic computations to generate the partial sums and output activation values in a parallel fashion according to the received SIMD instructions, thus corresponding to a process where each of the plurality of neural cores is configured to compute in parallel output activations by applying its plurality of vector multipliers to input activations (Temam Figure 2, col.11 lines 23-64; Figure 2, col.12 lines 6-21; and Figure 4, col.14 lines 14-17).); 
assigning each of the plurality of neural cores a subset of output activations of a layer of a neural network for computation (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites performing computations for adjacent neural network layers, where each neural core receives a subset of activations representing the output activations of a previous neural network layer. As indicated earlier, Temam teaches a plurality of compute tiles (“plurality of neural cores”), where each of these compute tiles is attached to an output activation bus that transports the results of multiplication operations performed by a previous compute tile. Temam further teaches each compute tile stores a subset of input activations needed to compute a subset of output activations that are assigned to a particular tile, where these subset of output activations performed at a compute layer are transferred to other compute tiles via the output activation bus (corresponding to the mesh bus that connects neighboring computing tiles) to compute output activations for a subsequent layer, and as such, this data flow of output activations (representing computations from a neural network layer) from one compute tile to another neighboring tile to further compute output activations for a subsequent neural network layer corresponds to a process where each of the plurality of neural cores is assigned a subset of output activations of a layer of a neural network for computation (Temam Figure 1, col.9 lines 6-37).); and
upon receipt of a subset of input activations of the layer of the neural network, each of the plurality of neural cores 
computing a partial sum for each of its assigned output activations (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites each neural core performing operations involving computing partial sums based on the received subset of input activations (where these input activations are the output activations of a previous neural network layer), where these partial sums are used for computing corresponding output activations. As indicated earlier, Temam teaches a plurality of compute tiles (“plurality of neural cores”), where multiplication operations are performed by the cells within the MAC array using the input activations and a parameter accessed from wide memory, such that these cells produce partial sums that are accumulated and either stored in a corresponding sum register or re-written into wide-memory (for re-access by a particular cell of MAC array 214), to complete follow-on multiply operations to compute the output activation result provided to the output activation bus. Hence, this process in which each compute tile, upon receiving a subset of input activations, uses the corresponding MAC array to perform dot product multiplication and addition/summation operations to compute partial sums representing output activation results, corresponds to a process in which each neural core, upon receiving a subset of input activations, computes a partial sum for each of its assigned output activations (Temam Figure 1, col.9 lines 6-37; and Figure 2, col.11 lines 23-64).), and 
computing its assigned output activations from at least the computed partial sums (Examiner’s note: Under its broadest interpretation, this limitation broadly recites each neural core performing operations involving computing output activations based on the computed partial sums. As indicated earlier, Temam teaches a plurality of compute tiles (“plurality of neural cores”), where each of these compute tiles is attached to an to an input activation bus that receives input activation data from a controller, and an output activation bus that transports the results of multiplication operations performed by a previous compute tile. As indicated earlier, Temam teaches these multiplication operations generate partial sums that are further used to compute the output activation result provided to the output activation bus, and as such, each compute tile performs operations that correspond to computing the output activations from the computed partial sums based upon receipt of a subset of input activations from a neural network layer (Temam Figure 1, col.9 lines 6-37; and Figure 2, col.11 lines 23-64).), 
wherein the plurality of vector compute units each comprise multiplication and addition units (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites each vector compute unit contains logical functions to perform both multiplication and addition operations, where these logical functions are referred to as “multiplication units” and “addition units”. As indicated earlier, Temam teaches each cell within a MAC array in a compute tile contains a MAC operator and a sum register, where the MAC operator and sum register within each cell are used to perform the arithmetic operations to generate the partial sums, which are accumulated and stored in the sum register, and as such, the MAC array containing these MAC operators and sum registers that perform multiplication operations to generate the partial sums and accumulation and summation operations to add these partial sums together correspond to logical functions performing multiplication and addition operations (thus representing multiplication and addition units, respectively) (Temam Figure 2, element 214 containing cell #0, #1, #2, #3 and col.11 lines 23-42).).
Regarding original Claim 19, 
Claim 19 recites the method of claim 16, comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 5, and hence is rejected under similar rationale provided by Temam as indicated in Claim 5, in view of rejections applied to Claim 16.
Regarding original Claim 20, 
Claim 20 recites the method of claim 16, comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 6, and hence is rejected under similar rationale provided by Temam as indicated in Claim 6, in view of rejections applied to Claim 16.
Regarding original Claim 21, 
Claim 21 recites the method of claim 16, comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 7, and hence is rejected under similar rationale provided by Temam as indicated in Claim 7, in view of rejections applied to Claim 16.
Regarding original Claim 22, 
Claim 22 recites the method of claim 16, comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 8, and hence is rejected under similar rationale provided by Temam as indicated in Claim 8, in view of rejections applied to Claim 16.
Regarding original Claim 23, 
Claim 23 recites the method of claim 16, comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 9, and hence is rejected under similar rationale provided by Temam as indicated in Claim 9, in view of rejections applied to Claim 16.
Regarding original Claim 24, 
Claim 24 recites the method of claim 16, comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 10, and hence is rejected under similar rationale provided by Temam as indicated in Claim 10, in view of rejections applied to Claim 16.
Regarding original Claim 25, 
Claim 25 recites the method of claim 16, comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 11, and hence is rejected under similar rationale provided by Temam as indicated in Claim 11, in view of rejections applied to Claim 16.
Regarding new Claim 26, 
Temam teaches
(New) The neural inference chip of claim 1, wherein each neural core of the plurality of neural cores shares at least a portion of its computations with at least one other neural core of the plurality of neural cores (Examiner’s note: Under its broadest reasonable interpretation, the term “at least a portion of its computations” is interpreted as referring to a subset of computation actions, where the computation actions are those that apply the plurality of vector compute units to input activations, such that the subset of computation actions are shared/divided among different neural cores. Temam teaches a single SIMD instruction can be provided to multiple compute tiles (“plurality of neural cores”) for consumption in multiple MAC arrays, such that the tensor arithmetic computations resulting from this SIMD instruction are partitioned amongst the compute tiles of tile sets in order to perform simultaneous computations. Hence this partitioning of a single SIMD instruction to multiple compute tiles to perform simultaneous tensor arithmetic computations corresponds to a process where each neural core of the plurality of neural cores shares at least a portion of its computation with at least one other neural core of the plurality of neural cores (Temam col.12 lines 22-38: “… a single instruction can be provided by controller 102 to multiple compute tiles 200 (see tile sets 112, 114 of FIG. 1) for consumption by multiple MAC arrays 214. … neural network layers can include multiple output neurons and the output neurons can be partitioned such that tensor computations associated with a subset of output neurons can be assigned to a particular tile of tile sets 112, 114. Each tile of tile sets 112, 114 can then perform related tensor computations on different groups of neurons for a given layer. Compute tile 200 can therefore provide at least two forms of parallelism: 1) one form includes partition the output activations … amongst the multiple tiles … 2) another form includes simultaneous computation (with a single instruction) of multiple subsets of output neurons based on the partitioning amongst the tiles of tile sets 112, 114.”).).
Regarding new Claim 27, 
Claim 27 recites the method of claim 16, comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 26, and hence is rejected under similar rationale provided by Temam as indicated in Claim 26, in view of rejections applied to Claim 16.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 
1. Determining the scope and contents of the prior art. 
2. Ascertaining the differences between the prior art and the claims at issue. 
3. Resolving the level of ordinary skill in the pertinent art. 
4. Considering objective evidence present in the application indicating obviousness or nonobviousness. 
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention. 
Claims 3 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over 
Temam et al., U.S. Patent 9,710,265, published 7/18/2017 [hereafter referred as Temam] in view of Dally et al., U.S. PGPUB 2018/0046906, published 2/15/2018 [hereafter referred as Dally].
Regarding original Claim 3, 
Temam teaches
(Original) The neural inference chip of claim 2, wherein upon receipt of a subset of input activations of the layer of the neural network, each of the plurality of neural cores 
… receives partial sums for at least one of its assigned output activations (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites each neural core performing operations involving computing partial sums based on the received subset of input activations (where these input activations are the output activations of a previous neural network layer), where these partial sums are used for computing corresponding output activations. As indicated earlier, Temam teaches a plurality of compute tiles (“plurality of neural cores”), where multiplication operations are performed by the cells within the MAC array using the input activations and a parameter accessed from wide memory, such that these cells produce partial sums that are accumulated and either stored in a corresponding sum register or re-written into wide-memory (for re-access by a particular cell of MAC array 214), to complete follow-on multiply operations to compute the output activation result provided to the output activation bus. Hence, this process in which each compute tile, upon receiving a subset of input activations, re-accesses the wide memory to retrieve the previously available partial sums to complete the follow-on multiply operations to compute the output activation result, corresponds to a process in which each neural core, upon receiving a subset of input activations, receives partial sums for at least one of its assigned output activations (Temam Figure 1, col.9 lines 6-37; and Figure 2, col.11 lines 23-64).) …
… computes its assigned output activations from the computed partial sums (Examiner’s note: As indicated earlier, Temam teaches a plurality of compute tiles (“plurality of neural cores”), where each of these compute tiles is attached to an to an input activation bus that receives input activation data from a controller, and an output activation bus that transports the results of multiplication operations performed by a previous compute tile. As indicated earlier, Temam teaches these multiplication operations generate partial sums that are further used to compute the output activation result provided to the output activation bus, and as such, each compute tile performs operations that correspond to computing the output activations from the computed partial sums based upon receipt of a subset of input activations from a neural network layer (Temam Figure 1, col.9 lines 6-37; and Figure 2, col.11 lines 23-64).) …
However, Temam does not explicitly teach
… receives partial sums … from another of the plurality of neural cores …
… computes its assigned output activations from the … received partial sums …
Dally teaches
… receives partial sums … from another of the plurality of neural cores (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites each neural core receiving partial sums from other neural cores to compute the corresponding output activations. Dally teaches a tiling strategy in which the input and output activations are further partitioned among processing elements into                         
                            
                                
                                    W
                                
                                
                                    t
                                
                            
                            x
                            
                                
                                    H
                                
                                
                                    t
                                
                            
                        
                     tiles, where the cross-tile dependencies at the tile edges (referred to as halos) are resolved by exchanging the incomplete partial sums to neighboring PEs for accumulation. Dally additionally teaches the accumulation is stored in the accumulation buffer of each PE, where the accumulation unit within each PE further updates the partial sums to produce the output activations. Hence this process of exchanging incomplete partial sums to neighboring PEs for accumulation and further computation to produce the corresponding output activations corresponds to a process that receives partial sums from another of the plurality of neural cores (Dally [0063]-[0064]: “To scale beyond the practical limits of multiplier count and buffer sizes within a PE 220, a tiling strategy may be employed to spread the work across an array of PEs 210 so that each PE 210 can operate independently. Weights are broadcast to the PEs 210 and each PE 210 operates on an exclusive subset of the input and output activation space. … the sliding-window nature of the convolution operation introduces cross-tile dependencies at tile edges. These dependencies are called halos … The second technique for handling halos is to size the accumulation buffer in each PE 210 to be slightly larger than                         
                            
                                
                                    K
                                
                                
                                    c
                                
                            
                            x
                            W
                            x
                            H
                        
                     to accommodate the halos. The halos now contain incomplete partial sums that must be communicated to neighbor PEs 210 for communication…” and [0076]: “… Each accumulator unit within the accumulator array includes an addressable bank of storage, adder, and a register for storing partial sums associated with the output-channel group being processed … When the output-channel group is complete, the post-processing unit 345 performs the following tasks: (1) exchange partial sums with neighboring PEs 210 for the halo regions at the boundary of the PE’s 210 output activations …”; and [0061]).) …
… computes its assigned output activations from the … received partial sums (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites each neural core computing output activations based on the received partial sums from other neural cores. As indicated earlier, Dally teaches a tiling strategy in which the cross-tile dependencies at the tile edges (referred to as halos) are resolved by exchanging the incomplete partial sums to neighboring PEs for accumulation, and where the accumulation unit within each PE further updates the partial sums to produce the output activations. Hence this process of exchanging incomplete partial sums to neighboring PEs for accumulation and further computation to produce the corresponding output activations corresponds to a process that computes its assigned output activations from the received partial sums from another of the plurality of neural cores (Dally [0063]-[0064] and [0076]; and [0061]: “… The multiplier outputs (e.g., products) are sent to the accumulation unit 245, which updates the partial sums stored in the accumulation buffer 250. Each product is accumulated with a partial sum at the output coordinates in the output activation space … The output positions for the products are computed in parallel with the products …”).).
Both Temam and Dally are analogous art since they both teach performing neural network computations using a plurality of processing elements and compute units for computing partial sums.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the subset of input activations and subset of output activations taught in Temam and further partition them to implement the tiling strategy taught in Dally as a way to further improve the scalability of the neural network computations by load balancing the computations across an array of processing elements. The motivation to combine is taught in Dally, since spreading the work across the array of processing elements allows improved scalability of the neural network computations and allows each processing element to perform computations in an independent manner, thereby also improving the parallelism within the neural network system (Dally [0063]).
Regarding original Claim 17, 
Claim 17 recites the method of claim 16, comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 3, and hence is rejected under similar rationale and motivations provided by Temam and Dally as indicated in Claim 3, in view of rejections applied to Claim 16.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Amirineni et al., U.S. Patent 10,445,638, Restructuring a Multi-Dimensional Array, filed 2/28/2018, where Amirineni teaches an array of processing elements for implementing an artificial neural network, where each processing element contains multipliers and adders to compute partial sums based on the input values and weights, and where each processing element receives partial sums from a prior PE within the same column, adds the partial sum to a multiplication result, and transmits an updated partial sum to the next PE within the same column (Amirineni col.12 line 22-col.13 line 3 and Figure 4C).
Esser et al., Cognitive Computing Systems: Algorithms and Applications for Networks of Neurosynaptic Cores, 2013 International Joint Conference on Neural Networks (IJCNN), August 4-9 2013, IEEE, 10 pages, where Esser teaches a chip architecture built from an interconnected network of lightweight configurable neurosynaptic cores, arranged in a crossbar architecture, where each core contains memory, processors, and communication elements (Esser p.1 Section I.A. Context and p.2 Figure 1), and where groups of cores (corelets) are configured to perform certain neural network layer functionalities, such as a partial sum corelet that computes partial matrix multiplication for 64 inputs (Esser p.5 Section III.C Digit Recognition: Linear Classifier), and where different types of corelets can be arranged in a linear pipeline fashion to perform feature extraction and nonlinear classification (Esser pp.7-8 Section III.E Collision Avoidance).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WAI YIN KWAN whose telephone number is 303-297-4332. The examiner can normally be reached Monday-Friday 8:00am - 4:30pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 571-272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/WILLIAM WAI YIN KWAN/Examiner, Art Unit 2121                                                                                                                                                                                                        
/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121