Notice of Pre-AIA  or AIA  Status
	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
	Claims 1, 4-23 have been examined.

Claim Rejections - 35 U.S.C. § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. §§ 102 and 103 (or as subject to pre-AIA  35 U.S.C. §§ 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. §102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1, 4- 9 and 12-23 are rejected under 35 U.S.C. §102(a)(1) as being anticipated by Zhang, et al., Analyzing and Mitigating the Impact of Permanent Faults on a Systolic Array Based Neural Network Accelerator, arXiv:1802.04657, 17 Feb 2018, pp. 1-6. Specifically:

Claim 1
           Claim 1's ''A neural core device comprising:'' is anticipated by Zhang, et al., page 1, right column, first full paragraph, where it recites:

An example of a systolic array based DNN accelerator is the Google Tensor Processing Unit (TPU), that uses 256 x 256 grid of MAC units at its core, and provides between 30 x to 80 x times greater performance than CPU or GPU based servers [6]. The TPU is widely deployed in Google datacenters to accelerate DNN inference while reducing the total datacenter energy consumption. In this paper, we use the TPU architecture from Google as the baseline design, but our proposed techniques can apply to any systolic array based DNN accelerators. In the rest of the paper, we will use TPU to refer to the general class of systolic array based DNN accelerators.

           Claim 1's ''a weight memory;'' is anticipated by Zhang, et al., page 2, Figure 1(b), where it shows a “weight memory”.

           Claim 1's ''an activation memory;'' is anticipated by Zhang, et al., page 2, Figure 1(b), where it shows an “activation memory”.

           Claim 1's ''a vector-matrix multiplier adapted to receive a weight matrix from the weight memory, receive an activation vector from the activation memory, and compute a vector-matrix multiplication of the weight matrix and the activation vector;'' is anticipated by Zhang, et al., page 2, Figure 1(b), where it shows a “MAC Unit”.

           Claim 1's ''a vector processor adapted to receive one or more input vector from one or more vector source including the vector matrix multiplier and perform one or more vector functions on the one or more input vector to yield an output vector; and'' is anticipated by Zhang, et al., page 2, Figure 1(b), where it shows multiple “MAC Units” that output to a vector of accumulators.

           Claim 1's ''a programmable controller operatively coupled to the vector processor, the controller adapted to:'' map one or more of the plurality of sources to the vector processor, map the vector processor to one or more of the plurality of vector targets, instruct the vector processor to perform a vector function on input from the one or more of the plurality of sources and provide results to the one or more of the plurality of vector targets, provide the output vector to the one or more vector targets.'' is anticipated by Zhang, et al., page 4, right column, first full paragraph, “Algorithm 1”, where it shows TPU mapping and weight pruning in addition to backpropagation.

Claim 4
           Claim 4's ''an activation unit operatively coupled to the vector processor and adapted to: apply an activation function to the results from the vector processor.'' is anticipated by Zhang, et al., page 2, right column, last full paragraph, where it recites:

“…followed by an element-wise activation function φ.”

Claim 5
           Claim 5's ''The neural core device of claim 1, wherein the one or more vector sources comprise a partial sum memory, a network, a register, or a parameter memory.'' is anticipated by Zhang, et al., page 2, Figure 1(b), where it shows multiple “MAC Units” that partial summing nodes.

Claim 6
           Claim 6's ''The neural core device of claim 1, wherein the vector targets comprise an activation memory, a partial sum memory, a register, or a network.'' is anticipated by Zhang, et al., page 2, Figure 1(b), where it shows multiple “MAC Units” that partial summing nodes.

Claim 7
           Claim 7's ''The neural core device of claim 1, wherein the vector processor is adapted to apply one or more constant to the results.'' is anticipated by Zhang, et al., page 2, right column, last full paragraph, where it recites:

3.1. Deep Neural Networks

A DNN consists of L stacked layers of computation, as shown in Figure 1a. Layer l has Nl neurons whose outputs are referred to as activations, represented by an Nl dimensional vector al. Each layer multiplies the vector of activations from the previous layer with a weight matrix wl of dimensions Nl x Nl and adds constant biases represented by an Nl dimensional vector bl, followed by an element-wise activation function...

Claim 8
           Claim 8's ''The neural core device of claim 1, configured to accumulate partial sums.'' is anticipated by Zhang, et al., page 2, Figure 1(b), where it shows multiple “MAC Units” that output to a vector of accumulators.

Claim 9
           Claim 9's ''The neural core device of claim 1, wherein: the controller is further adapted to instruct the vector-matrix multiplier to read a weight matrix from the weight memory, read an activation vector from the activation memory, and to compute a vector-matrix multiplication of the weight matrix and the activation vector.'' is anticipated by Zhang, et al., page 2, Figure 1(b), where it shows a “weight memory”, an “activation memory”, and “MAC units”.

Claim 12
           Claim 12's ''The neural core device of claim 9, wherein: the weight matrix is a subarray of a neural network weight matrix;'' is anticipated by Zhang, et al., page 3, left column, first full paragraph, where it recites:

3.2. DNN Acceleration on TPU

Figure 1b shows a block-diagram of the TPU architecture, at the heart of which is a systolic array containing N x N MAC units that is used to perform the computationally expensive matrix multiplication and convolution operations. To understand how, consider a fully-connected layer with N input neurons and N output neurons, and consequently an N x N weight matrix. The weight matrix is first loaded into the systolic array, with weight wi,j being loaded into the MAC in row j and column i, which we refer to as MACi,j.

           Claim 12's ''the activation vector is a subarray of a neural network activation vector.'' is anticipated by Zhang, et al., page 2, right column, last full paragraph, where it recites:

3.1. Deep Neural Networks

A DNN consists of L stacked layers of computation, as shown in Figure 1a. Layer l has N neurons whose outputs are referred to as activations, represented by an Nl dimensional vector al. Each layer multiplies the vector of activations from the previous layer with a weight matrix...

Claim 13
           Claim 13's ''The neural core device of claim 12, wherein the vector-matrix multiplication of the weight matrix and the activation vector is provided for accumulation, said accumulation yielding a vector-matrix multiplication of the neural network weight matrix and the neural network activation vector.'' is anticipated by Zhang, et al., page 2, Figure 1(b), where it shows the vector-matrix multiplications, the activation vector, and accumulations.

Claim 14
           Claim 14's ''The neural core device of claim 1, configured to compute a neural network function having an input, parameters, and an output.'' is anticipated by Zhang, et al., page 2, Figure 1(a), where it shows an input layer, hidden layers (with weight parameters), and an output layer.

Claim 15
           Claim 15's ''The neural core device of claim 14, wherein the weight matrix and/or the activation vector have configurable sizes.'' is anticipated by Zhang, et al., page 4, right column, first full paragraph, “Algorithm 1”, where it shows “pruning” of the weight matrix.

Claim 16
           Claim 16's ''The neural core device of claim 14, wherein the neural network input, parameters, and/or output have configurable sizes.'' is anticipated by Zhang, et al., page 2, left column, first partial paragraph, where it recites:

Our proposed solutions build on the recent work that shows that a significant fraction of a DNN’s connections can be pruned with no (or limited) impact on accuracy. However, while the prior work using pruning to reduce DNN execution time and memory usage [14]–[18], we do so to enable fault tolerance. FAP prunes all connections in a DNN that map to faulty MACs using simple bypass circuitry that requires only minor modifications to the baseline TPU. FAP+T additionally retrains the DNN after pruning to restore classification accuracy back or close to its baseline, but comes at the expense of extra “test time” per TPU chip.

	If all weights can be pruned, that means all weights to certain inputs and outputs can be pruned.

Claim 17
           Claim 17's ''The neural core device of claim 14, wherein the neural network function is configurable.'' is anticipated by Zhang, et al., page 2, left column, first partial paragraph, where it recites:

Our proposed solutions build on the recent work that shows that a significant fraction of a DNN’s connections can be pruned with no (or limited) impact on accuracy. However, while the prior work using pruning to reduce DNN execution time and memory usage [14]–[18], we do so to enable fault tolerance. FAP prunes all connections in a DNN that map to faulty MACs using simple bypass circuitry that requires only minor modifications to the baseline TPU. FAP+T additionally retrains the DNN after pruning to restore classification accuracy back or close to its baseline, but comes at the expense of extra “test time” per TPU chip.

	If all weights can be pruned, that means all weights to certain inputs and outputs can be pruned.

Claim 18
           Claim 18's ''The neural core device of claim 1, configured to compute a neural network function in conjunction with a plurality of additional neural cores interconnected by a network.'' is anticipated by Zhang, et al., page 2, Figure 1(b), where it shows multiply connected neural cores.

Claim 19
           Claim 19's ''The neural core device of claim 1, configured to compute a portion of a neural network function.'' is anticipated by Zhang, et al., page 2, Figure 1(b), where it shows multiply connected neural cores.

Claim 20
           Claim 20's ''The neural core device of claim 19, wherein the portion of the neural network function is configurable.'' is anticipated by Zhang, et al., page 4, right column, first full paragraph, “Algorithm 1”, where it shows “pruning” of the weight matrix.

Claim 21
           Claim 21's ''receiving a weight matrix from a weight memory;'' is anticipated by Zhang, et al., page 2, Figure 1(b), where it shows a “MAC Unit”.

           Claim 21's ''receiving an activation vector from an activation memory;'' is anticipated by Zhang, et al., page 2, Figure 1(b), where it shows a “MAC Unit”.

           Claim 21's ''computing a vector-matrix multiplication of the weight matrix and the activation vector;'' is anticipated by Zhang, et al., page 2, Figure 1(b), where it shows a “MAC Unit”.

           Claim 21's ''performing one or more vector functions on the vector-matrix multiplication to yield an output vector;'' is anticipated by Zhang, et al., page 2, Figure 1(b), where it shows multiple “MAC Units” that output to a vector of accumulators.

           Claim 21's ''applying an activation function to the output vector to determine a result;'' is anticipated by Zhang, et al., page 2, right column, last full paragraph, where it recites:

“…followed by an element-wise activation function φ.”

           Claim 21's ''providing the result to the activation memory.'' is anticipated by Zhang, et al., page 2, right column, last full paragraph, where it recites:

“…followed by an element-wise activation function φ.”

Claim 22
           Claim 22's ''The method of claim 21, wherein the weight matrix is a subarray of a neural network weight matrix and the activation vector is a subarray of a neural network activation vector, the method further comprising:'' is anticipated by Zhang, et al., page 2, right column, last full paragraph, where it recites:

“…followed by an element-wise activation function φ.”

           Claim 22's ''accumulating the result with additional results to yield a vector-matrix multiplication of the neural network weight matrix and the neural network activation vector.'' is anticipated by Zhang, et al., page 2, Figure 1(b), where it shows the vector-matrix multiplications, the activation vector, and accumulations.

Claim 23
           Claim 23's ''mapping by a programmable controller one or more of a plurality of vector sources to a vector processor;'' is anticipated by Zhang, et al., page 2, Figure 1(a), where it shows an input layer, hidden layers (with weight parameters), and an output layer.

           Claim 23's ''mapping by the programmable controller the vector processor to one or more of a plurality of vector targets;'' is anticipated by Zhang, et al., page 4, right column, first full paragraph, “Algorithm 1”, where it shows TPU mapping and weight pruning in addition to backpropagation.

           Claim 23's ''instructing by the programmable controller the vector processor to perform a vector function on input from the one or more of the plurality of sources and provide results to the one or more of the plurality of vector targets.'' is anticipated by Zhang, et al., page 4, right column, first full paragraph, “Algorithm 1”, where it shows TPU mapping and weight pruning in addition to backpropagation.

Claim Rejections - 35 USC § 103
	In the event the determination of the status of the application as subject to AIA  35 U.S.C. §§ 102 and 103 (or as subject to pre-AIA  35 U.S.C. §§ 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

	The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 10-11 are rejected under 35 U.S.C. § 103 as being unpatentable over Zhang, et al., Analyzing and Mitigating the Impact of Permanent Faults on a Systolic Array Based Neural Network Accelerator, arXiv:1802.04657, 17 Feb 2018, pp. 1-6 in view of Abdelsalam, et al., A Configurable FPGA Implementation of the Tanh Function using DCT Interpolation, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines, 2017, pp. 168-171. Specifically:

Claim 10
           Claim 10's ''The neural core of claim 4, wherein the activation function is configurable.'' is not expressly taught by Zhang, et al. It is, however, taught by Abdelsalam, et al., page 171, left column, first full paragraph, where it recites:

The DCTIF tanh approximation error analysis is presented in Fig. 4. It can be seen that the DCTIF approximation error increases for small α values. Although a large α value means that fewer points need to be interpolated, this comes at the expense of memory resources since more samples must be stored. A large value of s increases the accuracy of the approximation, but increases complexity as well because the interpolation coefficients take larger values, potentially expressed with more signed digits as shown in Table I.

	Rationale – It would have been obvious for one of ordinary skill in the art, at the time of the effective filing date, to substitute the reconfigurable activation function of Abdelsalam, et al. for the non-reconfigurable one of Zhang, et al.

Claim 11
           Claim 11's ''the programmable controller is further adapted to instruct the activation unit to compute the activation function and provide results to the activation memory.'' is not expressly taught by Zhang, et al. It is, however, taught by Abdelsalam, et al., page 171, left column, first full paragraph, where it recites:

The DCTIF tanh approximation error analysis is presented in Fig. 4. It can be seen that the DCTIF approximation error increases for small α values. Although a large α value means that fewer points need to be interpolated, this comes at the expense of memory resources since more samples must be stored. A large value of s increases the accuracy of the approximation, but increases complexity as well because the interpolation coefficients take larger values, potentially expressed with more signed digits as shown in Table I.

	Rationale – It would have been obvious for one of ordinary skill in the art, at the time of the effective filing date, to substitute the calculated reconfigurable activation function of Abdelsalam, et al. for the non-calculated one of Zhang, et al.

Response to Arguments
	Applicant's arguments filed 25 OCT 2021 have been fully considered but they are not persuasive. Specifically:

Argument 1
The Examiner rejected Claims 1 under 35 U.S.C. §102 as anticipated by Zhang. With the present submission. Claim 1 has been amended to include elements of Claim 3. The Examiner also rejected Claim 3 under 35 U.S.C. §102 as anticipated by Zhang.

Zhang describes a Tensor Processing Unit (TPU) that includes a grid of multiply-and-accumulate (MAC) units. (p. 1, top of second column; Fig. 1b.) These TPU units are used to perform matrix multiplication and convolution operations. (§3.2.) In particular, each weight in a DNN is loaded to exactly one MAC unit. (§5.) Accordingly, and as shown in the inset of Fig. 1b, each MAC unit multiplies a single weight (wij) with a single input activation (aij).

To the extent that Zhang describes a matrix multiplier, it does not describe a “programmable controller,” as claimed. In rejecting Claim 3, the Examiner appears to compare the vector sources and vector targets to the MAC units of Zhang. However, as noted above, each MAC unit performs scalar operations. That is, they neither take as input nor provide as output a vector. Accordingly, Zhang does not disclose or suggest the claimed “instructing...” step.

	Applicant’s argument is conclusory.
	The TPU is controlled programmatically. Further, the workings of the TPU are also shown in Zhang, et al., page 2, left column, last partial paragraph, through page 2, right column, first partial paragraph, where it recites:

It is important to note that, as opposed to FAP and FAP+T (our proposed solutions), all of these techniques are application agnostic, that is, they seek to correctly execute the original task/application in the presence of faults. FAP and FAP+T, on the other hand, are application-aware, that is, we modify the underlying DNN architecture using pruning and re-training to adapt to faults in the TPU. Consequently, FAP and FAP+T have no performance penalty and negligible area overhead.

	Applicant’s argument is unpersuasive.
	The rejections stand.

Argument 2
Moreover, Zhang does not disclose or suggest the claimed “mapping...” steps. In applying Zhang, the Examiner refers to Zhang’s note that “each weight in the DNN maps to exactly one MAC unit” and to Algorithm 1, which refers to loading DNN weights and a TPU fault map. Zhang is merely referring to loading scalar values to individual MAC units and to tracking faulty MAC units. There is no disclosure or suggestion of “map[ping] the one or more vector source to the vector processor, [and] map[ping] the vector processor to one or more vector targets,” as claimed.

In light of the above remarks, Applicant respectfully submits that Claim 1 should be deemed allowable over the prior art of record.

Applicant mischaracterizes the prior art to say that “Zhang is merely referring to loading scalar values to individual MAC units and to tracking faulty MAC units”. The input to the neural network and, hence the TPU is a vector.

The reason for the drop in accuracy is that stuck-at faults frequently affect the higher order bits of the MAC output, resulting in large absolute errors in the matrix-vector product. Figures 2b scatters the golden (fault-free) activations of the final layer of the TIMIT DNNs with the corresponding faulty outputs. Observe that for TIMIT, the faulty outputs have much higher magnitudes than the golden outputs.

Note that the “MAC output” is a “matrix-vector product” (i.e., a “dot product”). Such a product requires the input of vectors, which neural networks have as inputs. It is erroneous to look at the elements of a vector as scalars while asserting that the overall vector to which they belong doesn’t exist.
	Applicant’s argument is unpersuasive.
	The rejections stand.

Argument 3
The Examiner rejected Claims 1 under 35 U.S.C. §102 as anticipated by Zhang.

However, Zhang does not disclose or suggest “providing the result to the activation memory,” as claimed. The portion of Zhang pointed to by the Examiner merely indicates that an activation function is applied at each layer of a DNN. (§3.1.) The is no indication that the result of such an activation function 1s “provid[ed]... to the activation memory,” as claimed.

In light of the above remarks, Applicant respectfully submits that Claim 21 should be deemed allowable over the prior art of record.

	Zhang, et al., page 3, left column, first full paragraph recites:

3.2. DNN Acceleration on TPU

Figure 1b shows a block-diagram of the TPU architecture, at the heart of which is a systolic array containing N X N  MAC units that is used to perform the computationally expensive matrix multiplication and convolution operations. To understand how, consider a fully-connected layer with N input neurons and N output neurons, and consequently an N X N weight matrix. The weight matrix is first loaded into the systolic array, with weight wi,j being loaded into the MAC in row j and column i, which we refer to as MACi,j.

Note that the matrix of MAC units applies to individual layers. As the output of one layer goes to the next, it goes through the next layer’s “activation memory”, as shown in Zhang, et al., page 2, Figure 1(b).
	Applicant’s argument is unpersuasive.
	The rejections stand.

Argument 4
Claim 21

The Examiner rejected Claims 1 under 35 U.S.C. §102 as anticipated by Zhang.

As discussed above with regard to Claim 1, Zhang does not disclose or suggest the claimed mapping and instructing steps. Accordingly, Applicant respectfully submits that Claim 2 should be deemed allowable over the prior art of record in view of the arguments provided above in connection with Claim 1.

Argument 5
Dependent Claims 2-20 and 22

In addition to the specific grounds discussed above and in light of the remarks regarding Independent Claims 1 and 21, Applicant respectfully submits that Dependent Claims 2-20 and 22 should be deemed allowable over the prior art of record at least by virtue of depending from an allowable base claim.

	Applicant’s arguments for the independent claims were unpersuasive. Therefore, there are no novel and nonobvious limitations to incorporate by reference to the dependent claims.
	The rejections stand.

Conclusion
	THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).

	A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

            Any inquiries concerning this communication or earlier communications from the examiner should be directed to Wilbert L. Starks, Jr., who may be reached Monday through Friday, between 8:00 a.m. and 5:00 p.m. EST. or via telephone at (571) 272-3691 or email:  Wilbert.Starks@uspto.gov.

                If you need to send an Official facsimile transmission, please send it to (571) 273-8300. 

                If attempts to reach the examiner are unsuccessful the Examiner’s Supervisor (SPE), Kakali Chaki, may be reached at (571) 272-3719.

            Hand-delivered responses should be delivered to the Receptionist @ (Customer Service Window Randolph Building 401 Dulany Street, Alexandria, VA 22313), located on the first floor of the south side of the Randolph Building. 

                Finally, information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Moreover, status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have any questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) toll-free @ 1-866-217-9197.

            /WILBERT L STARKS/
            Primary Examiner, Art Unit 2122

WLS
04 JAN 2021