DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
	This action is in response to amendments and remarks filed on 05/06/2022. In the current amendments, the specification is amended, the drawings are amended, claim 14 is cancelled, and claims 1, 2, 7, 8, 12, 13, and 15-17 are amended. Claims 1-13 and 15-21 are pending and have been examined.
	In response to amendments and remarks filed on 05/06/2022, the objections to the specification, and the 35 U.S.C. 101 abstract idea rejection to claims 1-13 and 15-21 are withdrawn.
Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference character(s) not mentioned in the description: 195 in Fig. 1C.  Corrected drawing sheets in compliance with 37 CFR 1.121(d), or amendment to the specification to add the reference character(s) in the description in compliance with 37 CFR 1.121(b) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Objections
Claims 1-13 and 15-21 are objected to because of the following informalities:  
In Claim 1, lines 8-9, “each of the N processing element” should read “each of the N processing elements”
In Claim 1, lines 23-24, “the fist-stage processing element” should read “the first-stage processing element”
In Claim 15, lines 13-14, “each of the N processing element are” should read “each of the N processing elements are”
In Claim 15, line 29, “the fist-stage processing element” should read “the first-stage processing element”
Dependent claims 2-13 are objected to based on being directly or indirectly dependent on objected claim 1. Dependent claims 16-21 are objected to based on being directly or indirectly dependent on objected claim 15.
Appropriate correction is required.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
Claim 7:
“the pipeline is configured to cause output data from said one or more output ports of the last-stage processing element to be looped back to said one or more input ports of the first-stage processing element”
Upon a review of the specification, a description of the above limitation is found in Fig. 3 and the following paragraphs:
[0044]: Fig. 3 illustrates an example of a scalar element (SE) module 300 according to an embodiment of the present invention. The disclosed scalar element (SE) module 300 can be used as a building block to form an apparatus for implementing various activation functions. The scalar element (SE) module 300 comprises multiple pipeline stages (e.g. N stages, N >1). In Fig. 3, the example corresponds to an SE module with 8 SCU (scalar computing unit) pipeline stages (i.e., N = 8).
[0046]: Each scalar computing unit comprises multiple pipeline inputs and multiple pipeline outputs. The example in Fig. 3 illustrates exemplary scalar computing units with 3 inputs (i.e., in0-in2) and 3 outputs (i.e., out0-out2) in each scalar computing unit pipeline stage. Nevertheless, the specific number of inputs and outputs is intended for illustrating an example of multiple inputs and outputs and, by no means, the specific number of inputs and outputs constitutes limitations of the present invention.
[0055]: In another embodiment, the output from the last SCU pipeline stage can be looped back to the input of the first SCU pipeline stage so as to increase length of the pipeline stages. For example, the outputs (i.e., 350-0, 350-1 and 350-2) from the SCU pipeline stage 7 can be looped back to the inputs (i.e., 360-0, 360-1 and 360-2) through multiplexers 340. The multiplexers 340 can be configured to select the looped back inputs (i.e., 360-0, 360-1 and 360-2) or inputs (input 0, input 1 and input 2) from the full sum feeder.
Based on the description of the pipeline above, the structure of the pipeline consists of scalar computing units (processing elements) wherein the outputs on one scalar computing unit are coupled to the inputs of the next scalar computing unit, with the outputs of the last scalar computing unit being coupled with the inputs of the first scalar computing unit, which is sufficient structure to perform the claimed function.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-13 and 15-21 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitation “the calculated target activation function” in line 24. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the calculated target activation function” has been interpreted as “the target activation function” in reference to “a target activation function” in line 22.
Claim 1 recites the limitation “at least one of the set of electronic circuits or processors associated with a corresponding operator of the set of operators is commonly used for implementing at least two different activation functions” in lines 16-18. This limitation lacks clarity because it is unclear what a circuit or processor “commonly used for implementing at least two different activation functions” is. For examination purposes, “at least one of the set of electronic circuits or processors associated with a corresponding operator of the set of operators is commonly used for implementing at least two different activation functions” has been interpreted as any circuit or processor capable of implementing at least two different activation functions.
Claim 15 recites the limitation “the calculated target activation function” in line 30. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the calculated target activation function” has been interpreted as “the target activation function” in reference to “a target activation function” in line 27.
Claim 15 recites the limitation “at least one of the set of electronic circuits or processors associated with a corresponding operator of the set of operators is commonly used for implementing at least two different activation functions” in lines 22-24. This limitation lacks clarity because it is unclear what a circuit or processor “commonly used for implementing at least two different activation functions” is. For examination purposes, “at least one of the set of electronic circuits or processors associated with a corresponding operator of the set of operators is commonly used for implementing at least two different activation functions” has been interpreted as any circuit or processor capable of implementing at least two different activation functions.
Dependent claims 2-13 are rejected based on being directly or indirectly dependent on rejected claim 1. Dependent claims 16-21 are rejected based on being directly or indirectly dependent on rejected claim 15.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5-7, 10, 13, 15, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Henry et al. (US 2017/0102920 A1) in view of Koren et al. (US 10,509,846 B2) and further in view of Pillai et al. (US 2019/0042922 A1).
Regarding Claim 1,
	Henry et al. teaches a scalar element computing device for computing a selected
activation function selected from two or more different activation functions (Fig. 1 – Neural Network Unit (NNU) 121; [0055]: “The sequencer 128 also generates control signals to the NPUs 126 to instruct them to perform various operations or functions, such as initialization, arithmetic/logical operations, rotate and shift operations, activation functions and write back operations” teaches a neural network unit (NNU) for computing an activation function from various activation functions), the scalar element
computing device comprising:
	N processing elements (Fig. 1 – N Neural Processing Units (NPU) 126; [0052]: “The NNU 121 includes a weight random access memory (RAM) 124, a data RAM 122, N neural processing units (NPUs) 126” teaches N NPUs (processing elements)),
	wherein each of the N processing elements comprises one or more input ports and one or more output ports (Fig. 2; [0059]: “Referring now to FIG. 2, a block diagram illustrating a NPU 126 of FIG. 1 is shown” teaches each NPU (processing element) having input ports 206, 207, 211, and 213 as well as output ports 209 and 133),
	the N processing elements are arranged into a pipeline from a first-stage processing element to a last-stage processing element with a next-stage processing element for each of the N processing elements except for the last-stage processing element, and said one or more output ports of each of the N processing element are coupled to said one or more input ports of a corresponding next-stage processing element except for the last-stage processing element (Fig. 2, Fig. 3; [0273], [0056]: “The N NPUs 126 generate N result words 133 that may be written back to a row of the weight RAM 124 or to the data RAM 122. Preferably, the weight RAM 124 and the data RAM 122 are directly coupled to the N NPUs 126. More specifically, the weight RAM 124 and data RAM 122 are dedicated to the NPUs 126 and are not shared by the other execution units 112 of the processor 100, and the NPUs 126 are capable of consuming a row from one or both of the weight RAM 124 and data RAM 122 each clock cycle in a sustained manner, preferably in a pipelined fashion.” teaches the NPUs 126 arranged in a pipeline for receiving data from weight RAM 124 and data RAM 122 as well as the outputs of each NPU coupled to the inputs of the following NPU (Fig. 3)),
	wherein N is an integer greater than 1 (Abstract: “A neural network unit includes a random bit source that generates random bits and a plurality of neural processing units (NPU).” teaches the NNU consists of more than one NPU);
	an operator pool coupled to the N processing elements (Fig. 1, Fig. 2; teaches an Activation Function Unit (AFU) 212 (operator pool) coupled to all NPUs),
	wherein the operator pool comprises a set of electronic circuits or processors associated with a set of operators for implementing two or more different activation functions (Fig. 1, Fig. 2; [0064]: “Preferably, the AFU 212 is configured to perform multiple activation functions, and an input, e.g., from the control register 127, selects one of the activation functions to perform on the accumulator 202 output 217. The activation functions may include, but are not limited to, a step function, a rectify function, a sigmoid function, a hyperbolic tangent (tanh) function and a softplus function (also referred to as smooth rectify). The softplus function is the analytic function f(x)=ln(1+e.sup.x), that is, the natural logarithm of the sum of one and e.sup.x, where “e” is Euler's number and x is the input 217 to the function.” teaches the AFU 212 comprises a set of activation function operators for implementing two or more different activation functions. For example, softplus is one of the target activation functions, to calculate the function, you need to do several things, including addition and exponential operations as shown in the equation above. Fig. 30; [0103]: “Preferably, the mux 802 also has one or more inputs that receive the output of activation function circuits (e.g., elements 3022, 3024, 3026, 3018, 3014, and 3016 of FIG. 30)” teaches that the activation function operators are a set of circuits), and
	wherein the N processing elements are configured according to command information stored in the N command memories to calculate, by using the set of electronic circuits or processors, a target activation function among said two or more different activation functions for an activation-function input data provided to said one or more input ports of the fist-stage processing element and to provide the calculated target activation function from said one or more output ports of the last-stage processing element, wherein the target activation function is selected from said two or more different activation functions (Fig. 2; [0055]: “The sequencer 128 fetches instructions from the program memory 129 and executes them, which includes, among other things, generating address and control signals for provision to the data RAM 122, weight RAM 124 and NPUs 126. The sequencer 128 generates a memory address 123 and a read command for provision to the data RAM 122 to select one of the D rows of N data words for provision to the N NPUs 126. The sequencer 128 also generates a memory address 125 and a read command for provision to the weight RAM 124 to select one of the W rows of N weight words for provision to the N NPUs 126.” teaches sequencer 128, data ram 122, and weight ram 124 generate command instructions and input data for the N NPUs. [0060], [0064]: “The AFU 212 receives the output 217 of the accumulator 202. The AFU 212 performs an activation function on the accumulator 202 output 217 to generate a result 133 of FIG. 1 … Preferably, the AFU 212 is configured to perform multiple activation functions, and an input, e.g., from the control register 127, selects one of the activation functions to perform on the accumulator 202 output 217. The activation functions may include, but are not limited to, a step function, a rectify function, a sigmoid function, a hyperbolic tangent (tanh) function and a softplus function (also referred to as smooth rectify). The softplus function is the analytic function f(x)=ln(1+e.sup.x), that is, the natural logarithm of the sum of one and e.sup.x, where “e” is Euler's number and x is the input 217 to the function.” teaches the command information from the control register 127, sequencer, data ram, and weight ram is used to calculate the target activation function as well as how to calculate a target activation function. For example, softplus is one of the target activation functions, to calculate the function, you need to do several things, including addition and exponential operations as shown in the equation above).
	Henry et al. does not appear to explicitly teach N command memories, wherein the N command memories are coupled to the N processing elements individually; and at least one of the set of electronic circuits or processors associated with a corresponding operator of the set of operators is commonly used for implementing at least two different activation functions.
	However, Koren et al. teaches N command memories, wherein the N command memories are coupled to the N processing elements individually (Fig. 2; Col. 5 lines 7-13: “Specifically, a row multiplexer 240 and a column multiplexer 245 are coupled to a processing element 205 to receive the row data sets 210 and column data sets 220, respectfully. The row multiplexer 240 is coupled to and in communication with a row memory device 250 while the column multiplexer 245 is coupled to and in communication with a column memory device 255.” teaches each processing device is coupled to at least one row memory device 250/column memory device 255 (these memory devices can be considered command memory)).
	Henry et al. and Koren et al. are analogous to the claimed invention because they are directed to implementing a neural network.
	It would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to incorporate N command memories, wherein the N command memories are coupled to the N processing elements individually as taught by Koren et al. to the disclosed invention of Henry et al.
	One of ordinary skill in the art would have been motivated to make this modification in order to “reduce power consumption while improving processing speed and performance” (Koren et al. Col. 16 lines 31-32).
	Henry et al. in view of Koren et al. does not appear to explicitly teach at least one of the set of electronic circuits or processors associated with a corresponding operator of the set of operators is commonly used for implementing at least two different activation functions.
	However, Pillai et al. teaches at least one of the set of electronic circuits or processors associated with a corresponding operator of the set of operators is commonly used for implementing at least two different activation functions (Figs. 2A-2B; [0072]: “In particular, AF circuit 200 is a unified solution that implements multiple DNN activation functions on a single hardware component (e.g., rather than using separate hardware components for each activation function) without depending on lookup tables. For example, in the illustrated embodiment, AF circuit 200 is implemented using log, antilog, and exponent circuits 210, 220, 230 that perform log.sub.2, antilog.sub.2, and exponent calculations using piecewise linear approximation, which eliminates the need for lookup tables in the hardware design and reduces the required multiplier circuitry” teaches that the circuits associated with operators (i.e. log and exponent circuits in the activation function circuit 200) are used for implementing multiple activation functions (i.e. at least two) for a neural network).
	Henry et al., Koren et al., and Pillai et al. are analogous to the claimed invention because they are directed to implementing a neural network.
	It would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to incorporate at least one of the set of electronic circuits or processors associated with a corresponding operator of the set of operators is commonly used for implementing at least two different activation functions as taught by Pillai et al. to the disclosed invention of Henry et al. in view of Koren et al.
	One of ordinary skill in the art would have been motivated to make this modification because it “provides numerous advantages, including low latency, high precision, and reduced power consumption using a flexible, low-area hardware design that supports multiple activation functions and is highly scalable and portable” (Pillai et al. [0072]).
Regarding Claim 5,
Henry et al. in view of Koren et al. and further in view of Pillai et al. teaches the scalar element computing device of claim 1.
Additionally, Henry et al. teaches wherein the set of operators comprises one or more pool operators, wherein each pool operator is applied to a sequence of values (Fig. 27, Fig. 28; [0201]: “Of the 512 NPUs 126, every fourth NPU 126 of the 512 NPUs 126 (i.e., 128) performs a pooling operation on a respective 4×4 sub-matrix, and the other three-fourths of the NPUs 126 are unused. More specifically, NPUs 0, 4, 8, and so forth to NPU 508 each perform a pooling operation on their respective 4×4 sub-matrix whose left-most column number corresponds to the NPU number and whose lower row corresponds to the current weight RAM 124 row value” teaches a process by which the NPUs perform a pooling operation on input data).
Regarding Claim 6,
Henry et al. in view of Koren et al. and further in view of Pillai et al. teaches the scalar element computing device of claim 5.
Additionally, Henry et al. teaches wherein said one or more pool operators correspond to ADDPOOL to add the sequence of values, MINPOOL to select a minimum value of the sequence of values, MAXPOOL to select a maximum value of the sequence of values, or a combination thereof (Fig. 27, Fig. 28; [0201]: “In the example of FIG. 28, the pooling operation computes the maximum value of respective 4×4 sub-matrices of the input data matrix” teaches a pooling operation to calculate the maximum value of a sequence of values from an input data matrix).
Regarding Claim 7,
Henry et al. in view of Koren et al. and further in view of Pillai et al. teaches the scalar element computing device of claim 1.
Additionally, Henry et al. teaches wherein the pipeline is configured to cause output data from said one or more output ports of the last-stage processing element to be looped back to said one or more input ports of the first-stage processing element (Fig. 3; [0067]: “mux-reg 0 receives on its other input 211 the output 209 of mux-reg 511. Each of the mux-regs 208 receives the control input 213 that controls whether to select the data word 207 or the rotated input 211” teaches that each NPU has a 2-input multiplexed register (mux-reg 208) to select between input data from memory or output from a previous NPU, including a first NPU with mux-reg 0 receiving output from a last NPU with mux-reg 511. As shown in Fig. 3 and described above, the output 209 of the mux-reg 511 (of the last-stage processing element) is looped back to the input of mux-reg 0 (of the first-stage processing element)). 
Regarding Claim 10,
Henry et al. in view of Koren et al. and further in view of Pillai et al. teaches the scalar element computing device of claim 1.
Additionally, Henry et al. teaches wherein each of the N command memories is partitioned memory entries and each entry is divided into fields (Fig. 16, [0126]: “Referring now to FIG. 16, a block diagram illustrating an embodiment of the data RAM 122 of FIG. 1 is shown. The data RAM 122 includes a memory array 1606, a read port 1602 and a write port 1604. The memory array 1606 holds the data words and is preferably arranged as D rows of N words, as described above” teaches that the data RAM 122 is partitioned into a memory array that is D rows by N words for N NPUs (processing elements)).
Regarding Claim 13,
Henry et al. in view of Koren et al. and further in view of Pillai et al. teaches the scalar element computing device of claim 1.
Additionally, Henry et al. teaches further comprising a multiplexer with two or more sets of multiplexer input ports and a set of multiplexer output ports, wherein a feeder interface corresponding to full sum data is coupled to one set of the multiplexer input ports, said one or more output ports of the last-stage processing element are coupled to another set of the multiplexer input ports and the set of multiplexer output ports are coupled to said two or more input ports of the first-stage processing element, wherein the set of multiplexer output ports are selectively coupled to a target set of said two or more sets of multiplexer input ports (Fig. 3; [0067]: “mux-reg 0 receives on its other input 211 the output 209 of mux-reg 511. Each of the mux-regs 208 receives the control input 213 that controls whether to select the data word 207 or the rotated input 211” teaches that the first NPU is configured to use the multiplexed register mux-reg 0 to select data between data word 207 (full sum data) and output data from the last NPU consisting of mux-reg 511. This further teaches that each mux-reg 208 (multiplexer) has two input ports (207 and 211) and an output port (209) coupled to a target set of input ports for the next mux-reg (multiplexer)), and 
Regarding Claim 15,
	Henry et al. teaches a scalar computing subsystem for computing a selected activation function selected from two or more different activation functions (Fig. 1; [0042]: “Referring now to FIG. 1, a block diagram illustrating a processor 100 that includes a neural network unit (NNU) 121 is shown” teaches a processor 100 (scalar computing subsystem) that consists of an NNU 121. [0055]: “The sequencer 128 also generates control signals to the NPUs 126 to instruct them to perform various operations or functions, such as initialization, arithmetic/logical operations, rotate and shift operations, activation functions and write back operations” teaches the neural network unit (NNU) for computing an activation function from various activation functions), the scalar computing subsystem comprising:
	an interface module to receive input data for applying a selected activation function (Fig. 1; [0045] - [0046]: “The instruction cache 102 caches the architectural instructions 103 fetched from a system memory that is coupled to the processor 100. … The instruction cache 102 provides the architectural instructions 103 to the instruction translator 104, which translates the architectural instructions 103 into microinstructions 105. The microinstructions 105 are provided to the rename unit 106 and eventually executed by the execution units 112/121” teaches an instruction translator 104 (interface module) that receives input instructions/data from the instruction cache (which fetched them from memory) and then provides translated instructions to the NNU 121); and
	M scalar elements coupled to the interface module to receive data to be processed (Fig.1 teaches that the NNU 121 (scalar element) is coupled to the instruction translator 104 (interface module) to receive data to be processed),
	wherein M is an integer equal to or greater than 1 (Fig. 1 teaches a singular NNU 121 (scalar element) coupled to the instruction translator 104 (M = 1)); and
	wherein each scalar element comprises: N processing elements (Fig. 1 – N Neural Processing Units (NPU) 126; [0052]: “The NNU 121 includes a weight random access memory (RAM) 124, a data RAM 122, N neural processing units (NPUs) 126” teaches N NPUs (processing elements)),
	wherein each processing element comprises one or more local input ports and one or more local output ports (Fig. 2; [0059]: “Referring now to FIG. 2, a block diagram illustrating a NPU 126 of FIG. 1 is shown” teaches each NPU (processing element) having input ports 206, 207, 211, and 213 as well as output ports 209 and 133),
	the N processing elements are arranged into a pipeline from a first-stage processing element to a last-stage processing element with a next-stage processing element for each of the N processing elements except for the last-stage processing element, and one or more local output ports of each of the N processing element are coupled to one or more local input ports of a corresponding next-stage processing element except for the last-stage processing element (Fig. 2, Fig. 3; [0273], [0056]: “The N NPUs 126 generate N result words 133 that may be written back to a row of the weight RAM 124 or to the data RAM 122. Preferably, the weight RAM 124 and the data RAM 122 are directly coupled to the N NPUs 126. More specifically, the weight RAM 124 and data RAM 122 are dedicated to the NPUs 126 and are not shared by the other execution units 112 of the processor 100, and the NPUs 126 are capable of consuming a row from one or both of the weight RAM 124 and data RAM 122 each clock cycle in a sustained manner, preferably in a pipelined fashion.” teaches the NPUs 126 arranged in a pipeline for receiving data from weight RAM 124 and data RAM 122 as well as the outputs of each NPU coupled to the inputs of the following NPU (Fig. 3)),
	wherein N is an integer greater than 1 (Abstract: “A neural network unit includes a random bit source that generates random bits and a plurality of neural processing units (NPU).” teaches the NNU consists of more than one NPU);
	an operator pool coupled to the N processing elements (Fig. 1, Fig. 2; teaches an Activation Function Unit (AFU) 212 (operator pool) coupled to all NPUs),
	wherein the operator pool comprises a set of electronic circuits or processors associated with a set of operators for implementing two or more different activation functions (Fig. 1, Fig. 2; [0064]: “Preferably, the AFU 212 is configured to perform multiple activation functions, and an input, e.g., from the control register 127, selects one of the activation functions to perform on the accumulator 202 output 217. The activation functions may include, but are not limited to, a step function, a rectify function, a sigmoid function, a hyperbolic tangent (tanh) function and a softplus function (also referred to as smooth rectify). The softplus function is the analytic function f(x)=ln(1+e.sup.x), that is, the natural logarithm of the sum of one and e.sup.x, where “e” is Euler's number and x is the input 217 to the function.” teaches the AFU 212 comprises a set of activation function operators for implementing two or more different activation functions. For example, softplus is one of the target activation functions, to calculate the function, you need to do several things, including addition and exponential operations as shown in the equation above. Fig. 30; [0103]: “Preferably, the mux 802 also has one or more inputs that receive the output of activation function circuits (e.g., elements 3022, 3024, 3026, 3018, 3014, and 3016 of FIG. 30)” teaches that the activation function operators are a set of circuits); and 
	wherein the N processing elements are configured according to command information stored in the N command memories to calculate, by using the set of electronic circuits or processors, a target activation function among said two or more different activation functions for an activation-function input data provided to said one or more input ports of the fist-stage processing element and to provide the calculated target activation function from said one or more output ports of the last-stage processing element, wherein the target activation function is selected from said two or more different activation functions (Fig. 2; [0055]: “The sequencer 128 fetches instructions from the program memory 129 and executes them, which includes, among other things, generating address and control signals for provision to the data RAM 122, weight RAM 124 and NPUs 126. The sequencer 128 generates a memory address 123 and a read command for provision to the data RAM 122 to select one of the D rows of N data words for provision to the N NPUs 126. The sequencer 128 also generates a memory address 125 and a read command for provision to the weight RAM 124 to select one of the W rows of N weight words for provision to the N NPUs 126.” teaches sequencer 128, data ram 122, and weight ram 124 generate command instructions and input data for the N NPUs. [0060], [0064]: “The AFU 212 receives the output 217 of the accumulator 202. The AFU 212 performs an activation function on the accumulator 202 output 217 to generate a result 133 of FIG. 1 … Preferably, the AFU 212 is configured to perform multiple activation functions, and an input, e.g., from the control register 127, selects one of the activation functions to perform on the accumulator 202 output 217. The activation functions may include, but are not limited to, a step function, a rectify function, a sigmoid function, a hyperbolic tangent (tanh) function and a softplus function (also referred to as smooth rectify). The softplus function is the analytic function f(x)=ln(1+e.sup.x), that is, the natural logarithm of the sum of one and e.sup.x, where “e” is Euler's number and x is the input 217 to the function.” teaches the command information from the control register 127, sequencer, data ram, and weight ram is used to calculate the target activation function as well as how to calculate a target activation function. For example, softplus is one of the target activation functions, to calculate the function, you need to do several things, including addition and exponential operations as shown in the equation above).
	Henry et al. does not appear to explicitly teach N command memories, wherein the N command memories are coupled to the N processing elements individually; and at least one of the set of electronic circuits or processors associated with one of the set of operators is commonly used for implementing at least two different activation functions.
	However, Koren et al. teaches N command memories, wherein the N command memories are coupled to the N processing elements individually (Fig. 2; Col. 5 lines 7-13: “Specifically, a row multiplexer 240 and a column multiplexer 245 are coupled to a processing element 205 to receive the row data sets 210 and column data sets 220, respectfully. The row multiplexer 240 is coupled to and in communication with a row memory device 250 while the column multiplexer 245 is coupled to and in communication with a column memory device 255.” teaches each processing device is coupled to at least one row memory device 250/column memory device 255 (these memory devices can be considered command memory)).
	Henry et al. and Koren et al. are analogous to the claimed invention because they are directed to implementing a neural network.
	It would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to incorporate N command memories, wherein the N command memories are coupled to the N processing elements individually as taught by Koren et al. to the disclosed invention of Henry et al.
	One of ordinary skill in the art would have been motivated to make this modification in order to “reduce power consumption while improving processing speed and performance” (Koren et al. Col. 16 lines 31-32).
	Henry et al. in view of Koren et al. does not appear to explicitly teach at least one of the set of electronic circuits or processors associated with one of the set of operators is commonly used for implementing at least two different activation functions.
	However, Pillai et al. teaches at least one of the set of electronic circuits or processors associated with one of the set of operators is commonly used for implementing at least two different activation functions (Figs. 2A-2B; [0072]: “In particular, AF circuit 200 is a unified solution that implements multiple DNN activation functions on a single hardware component (e.g., rather than using separate hardware components for each activation function) without depending on lookup tables. For example, in the illustrated embodiment, AF circuit 200 is implemented using log, antilog, and exponent circuits 210, 220, 230 that perform log.sub.2, antilog.sub.2, and exponent calculations using piecewise linear approximation, which eliminates the need for lookup tables in the hardware design and reduces the required multiplier circuitry” teaches that the circuits associated with operators (i.e. log and exponent circuits in the activation function circuit 200) are used for implementing multiple activation functions (i.e. at least two) for a neural network).
	Henry et al., Koren et al., and Pillai et al. are analogous to the claimed invention because they are directed to implementing a neural network.
	It would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to incorporate at least one of the set of electronic circuits or processors associated with one of the set of operators is commonly used for implementing at least two different activation functions as taught by Pillai et al. to the disclosed invention of Henry et al. in view of Koren et al.
	One of ordinary skill in the art would have been motivated to make this modification because it “provides numerous advantages, including low latency, high precision, and reduced power consumption using a flexible, low-area hardware design that supports multiple activation functions and is highly scalable and portable” (Pillai et al. [0072]).
Regarding Claim 20,
Henry et al. in view of Koren et al. and further in view of Pillai et al. teaches the scalar computing subsystem of claim 15.
Additionally, Henry et al. teaches wherein the input data corresponds to full sum data or memory data from a unified memory (Fig. 1; [0045] - [0046]: “The instruction cache 102 caches the architectural instructions 103 fetched from a system memory that is coupled to the processor 100. … The instruction cache 102 provides the architectural instructions 103 to the instruction translator 104, which translates the architectural instructions 103 into microinstructions 105. The microinstructions 105 are provided to the rename unit 106 and eventually executed by the execution units 112/121” teaches an instruction translator 104 (interface module) that receives input instructions/data from the instruction cache, which fetched the architectural instructions (input data) from system memory).

Claims 2 and 3 are rejected under 35 U.S.C. 103 as being unpatentable over Henry et al. (US 2017/0102920 A1) in view of Koren et al. (US 10,509,846 B2), in view of Pillai et al. (US 2019/0042922 A1), and further in view of Sze et al. ("Efficient Processing of Deep Neural Networks").
Regarding Claim 2,
Henry et al. in view of Koren et al. and further in view of Pillai et al. teaches the scalar element computing device of claim 1.
	Henry et al. further teaches wherein said two or more different activation functions comprise at least two functions selected from a group consisting of Sigmoid, Hyperbolic Tangent (Tanh), … ((Fig. 30; [0064]: “The activation functions may include, but are not limited to, a step function, a rectify function, a sigmoid function, a hyperbolic tangent (tanh) function and a softplus function (also referred to as smooth rectify). The softplus function is the analytic function f(x)=ln(1+e.sup.x), that is, the natural logarithm of the sum of one and e.sup.x, where “e” is Euler's number and x is the input 217 to the function” teaches that the group of activation functions may include sigmoid and hyperbolic tangent (tanh). Fig. 30 further teaches that the two or more activation functions comprise at least the sigmoid and tanh functions).
	Henry et al. in view of Koren et al. and further in view of Pillai et al. does not appear to explicitly teach wherein said two or more different activation functions comprise at least two functions selected from a group consisting of … Rectified Linear Unit (ReLU) and leaky ReLU activation functions.
	However, Sze et al. teaches wherein said two or more different activation functions comprise … a group consisting of … Rectified Linear Unit (ReLU) and leaky ReLU activation functions (Page 2302, 1) Nonlinearity: “A nonlinear activation function is typically applied after each CONV or FC layer. Various nonlinear functions are used to introduce nonlinearity into the DNN as shown in Fig. 11. These include historically conventional nonlinear functions such as sigmoid or hyperbolic tangent as well as rectified linear unit (ReLU) … Variations of ReLU, such as leaky ReLU, parametric ReLU, and exponential LU have also been explored” teaches the use of ReLU and leaky ReLU activation functions for implementing a Deep Neural Network (DNN) model).
	Henry et al., Koren et al., Pillai et al., and Sze et al. are analogous to the claimed invention because they are directed to implementing a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein said two or more different activation functions comprise … a group consisting of Rectified Linear Unit (ReLU) and leaky ReLU activation functions as taught by Sze Et al. to the disclosed invention of Henry et al. in view of Koren et al. and further in view of Pillai et al.
	One of ordinary skill in the art would have been motivated to make this modification in order to be able to use the ReLU activation function “[for] its simplicity and its ability to enable fast training” (Sze et al. Page 2302, 1) Nonlinearity) and the leaky ReLU activation function “for improved accuracy” (Sze et al. Page 2302, 1) Nonlinearity).
Regarding Claim 3,
Henry et al. in view of Koren et al. and further in view of Pillai et al. teaches the scalar element computing device of claim 1.
	Henry et al. further teaches wherein the set of operators comprises addition and exponential operator ([0064]: “The activation functions may include, but are not limited to, a step function, a rectify function, a sigmoid function, a hyperbolic tangent (tanh) function and a softplus function (also referred to as smooth rectify). The softplus function is the analytic function f(x)=ln(1+e.sup.x), that is, the natural logarithm of the sum of one and e.sup.x, where “e” is Euler's number and x is the input 217 to the function” teaches that the pool of operators includes addition and exponential operators in order to calculate the softplus function).
	Henry et al. in view of Koren et al. and further in view of Pillai et al. does not appear to explicitly teach wherein the set of operators comprises multiplication, division, and maximum.
	However, Sze et al. teaches wherein the set of operators comprises multiplication, division, and maximum (Page 2302, 1) Nonlinearity: “A nonlinear activation function is typically applied after each CONV or FC layer. Various nonlinear functions are used to introduce nonlinearity into the DNN as shown in Fig. 11. These include historically conventional nonlinear functions such as sigmoid or hyperbolic tangent as well as rectified linear unit (ReLU) … Variations of ReLU, such as leaky ReLU, parametric ReLU, and exponential LU have also been explored” teaches the use of ReLU and leaky ReLU activation functions for implement a Deep Neural Network (DNN) model. Fig. 11 teaches the equations necessary to calculate the sigmoid, hyperbolic tangent (tanh), ReLU, and leaky ReLU functions. The sigmoid equation teaches that the pool of operators includes division, the ReLU equation teaches that the pool of operators includes maximum, and the leaky ReLU equation teaches that the pool of operators includes multiplication).
Henry et al., Koren et al., Pillai et al., and Sze et al. are analogous to the claimed invention because they are directed to implementing a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the set of operators comprises multiplication, division, and maximum operator as taught by Sze et al. to the disclosed invention of Henry et al. in view of Koren et al. and further in view of Pillai et al.
	One of ordinary skill in the art would have been motivated to make this modification in order to be able to use the ReLU activation function “[for] its simplicity and its ability to enable fast training” (Sze et al. Page 2302, 1) Nonlinearity) and the leaky ReLU activation function “for improved accuracy” (Sze et al. Page 2302, 1) Nonlinearity).

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Henry et al. (US 2017/0102920 A1) in view of Koren et al. (US 10,509,846 B2), in view of Pillai et al. (US 2019/0042922 A1), in view of Sze et al. ("Efficient Processing of Deep Neural Networks"), and further in view of Falcon et al. (US 2016/0026912 A1).
Regarding Claim 4,
Henry et al. in view of Koren et al. and further in view of Pillai et al. teaches the scalar element computing device of claim 1.
	Henry et al. further teaches wherein the set of operators comprises addition, exponential operator, and logarithmic operator (([0064]: “The activation functions may include, but are not limited to, a step function, a rectify function, a sigmoid function, a hyperbolic tangent (tanh) function and a softplus function (also referred to as smooth rectify). The softplus function is the analytic function f(x)=ln(1+e.sup.x), that is, the natural logarithm of the sum of one and e.sup.x, where “e” is Euler's number and x is the input 217 to the function” teaches that the pool of operators includes addition, exponential operator, and logarithmic operator (natural logarithm) in order to calculate the softplus function).
Henry et al. in view of Koren et al. and further in view of Pillai et al. does not appear to explicitly teach wherein the set of operators comprises multiplication, division, maximum, minimum, and square root operator.
However, Sze et al. teaches wherein the set of operators comprises multiplication, division, and maximum (Page 2302, 1) Nonlinearity: “A nonlinear activation function is typically applied after each CONV or FC layer. Various nonlinear functions are used to introduce nonlinearity into the DNN as shown in Fig. 11. These include historically conventional nonlinear functions such as sigmoid or hyperbolic tangent as well as rectified linear unit (ReLU) … Variations of ReLU, such as leaky ReLU, parametric ReLU, and exponential LU have also been explored” teaches the use of ReLU and leaky ReLU activation functions for implementing a Deep Neural Network (DNN) model. Fig. 11 teaches the equations necessary to calculate the sigmoid, hyperbolic tangent (tanh), ReLU, and leaky ReLU functions. The sigmoid equation teaches that the pool of operators includes division, the ReLU equation teaches that the pool of operators includes maximum, and the leaky ReLU equation teaches that the pool of operators includes multiplication).
Henry et al., Koren et al., Pillai et al., and Sze et al. are analogous to the claimed invention because they are directed to implementing a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the set of operators comprises multiplication, division, and maximum operator as taught by Sze et al. to the disclosed invention of Henry et al. in view of Koren et al. and further in view of Pillai et al.
	One of ordinary skill in the art would have been motivated to make this modification in order to be able to use the ReLU activation function “[for] its simplicity and its ability to enable fast training” (Sze et al. Page 2302, 1) Nonlinearity) and the leaky ReLU activation function “for improved accuracy” (Sze et al. Page 2302, 1) Nonlinearity).
Henry et al. in view of Koren et al., in view of Pillai et al., and further in view of Sze et al. does not appear to explicitly teach wherein the set of operators comprises minimum and square root operator.
However, Falcon et al. teaches wherein the set of operators comprises minimum and square root operator (Fig. 10; [0084]: “Processing device 1000 may be implemented in part by, for example, the elements illustrated in FIGS. 1-8. In the example of FIG. 10, processing device 1000 may include a processor block 1002, a calculation accelerator 1004, and a bus/fabric/interconnect system 1006. Processor block 1002 may further include one or more cores (e.g., P1-P4) to perform general purpose calculations and issue control signals through bus 1006 to the calculation accelerator 1004. Calculation accelerator 1004 may further include a number of calculation circuits (e.g., A1-A4) each of which may be reconfigured to perform a specific type of calculations for a CNN system” teaches cores for performing general purpose computations and calculation circuits for performing neural network calculations. Fig. 2; [0057]: “In yet another embodiment, floating point ALU 222 may include a 64-bit by 64-bit floating point divider to execute divide, square root, and remainder micro-ops” teaches that the process can include an Arithmetic Logic Unit ALU 222 capable of performing a square root operation. Fig. 12; [0089] – [0094]: “In one embodiment, for a given layer, the maximum and minimum values of weights 1204 may be determined” teaches that the calculation circuit can be using to perform maximum and minimum operations on input data).
Henry et al., Koren et al., Pillai et al., Sze et al., and Falcon et al. are analogous to the claimed invention because they are directed to implementing a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the set of operators comprises minimum and square root operator as taught by Falcon et al. to the disclosed invention of Henry et al. in view of Koren et al., in view of Pillai et al. and further in view of Sze et al.
	One of ordinary skill in the art would have been motivated to make this modification because “Each of the calculation circuits may include same or similarly arranged components that may be optimally adapted to different requirements of different layers of CNN systems. Thus, embodiments of the disclosure may perform filter/convolution operations for the convolution layer, average operations for the pooling layer, and dot product operations for the fully-connected layer by reusing the same calculation circuits whose precisions may be adapted for the requirements of different types of computation” (Falcon et al. [0082]).

Claims 8 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Henry et al. (US 2017/0102920 A1) in view of Koren et al. (US 10,509,846 B2), in view of Pillai et al. (US 2019/0042922 A1), and further in view of Xu et al. ("Empirical Evaluation of Rectified Activations in Convolution Network").
Regarding Claim 8,
Henry et al. in view of Koren et al. and further in view of Pillai et al. teaches the scalar element computing device of claim 1.
	Henry et al. in view of Koren et al. and further in view of Pillai et al. does not appear to explicitly teach wherein the set of operators comprises a range operator to indicate a range result of a first operand compared with ranges specified by one other second operand or two other operands.
	However, Xu et al. teaches wherein the set of operators comprises a range operator to indicate a range result of a first operand compared with ranges specified by one other second operand or two other operands (Page 2, 2.2. Leaky Rectified Linear Unit: “Leaky Rectified Linear activation is first introduced in acoustic model (Maas et al., 2013). Mathematically, we have 
    PNG
    media_image1.png
    51
    221
    media_image1.png
    Greyscale

where ai is a fixed parameter in range (1, +∞). In original paper, the authors suggest to set ai to a large number like 100. In additional to this setting, we also experiment smaller ai = 5.5 in our paper” teaches that the leaky ReLU activation function can be implemented using ≥ (a range operator) to compare the data range of a value xi (first operand) to 0 (second operand). In this example, the range operator would indicate whether xi falls within a range of values greater than or equal to zero, or a range of values less than 0). 
Henry et al., Koren et al., Pillai et al., and Xu et al. are analogous to the claimed invention because they are directed to implementing a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the set of operators comprises a range operator to indicate a range result of a first operand compared with ranges specified by one other second operand or two other operands as taught by Xu et al. to the disclosed invention of Henry et al. in view of Koren et al. and further in view of Pillai et al.
	One of ordinary skill in the art would have been motivated to make this modification in order to be able to use the disclosed invention of Henry et al. in view of Koren et al. to implement the leaky ReLU activation function because “three types of (modified) leaky ReLU [activation functions] all consistently outperform the original ReLU [activation function]” (Xu et al. Page 4, Conclusion).
Regarding Claim 9,
Henry et al. in view of Koren et al., in view of Pillai et al., and further in view of Xu et al. teaches the scalar element computing device of claim 8.
	Additionally, Xu et al. further teaches wherein one processing element is configured to use a target operator conditionally depending on the range result of the first operand in a previous-stage processing element (Page 2, 2.2. Leaky Rectified Linear Unit: “Leaky Rectified Linear activation is first introduced in acoustic model (Maas et al., 2013). Mathematically, we have 
    PNG
    media_image1.png
    51
    221
    media_image1.png
    Greyscale

where ai is a fixed parameter in range (1, +∞). In original paper, the authors suggest to set ai to a large number like 100. In additional to this setting, we also experiment smaller ai = 5.5 in our paper” teaches that the leaky ReLU activation function implementation uses the multiplication/division (Since αi is a constant, the equation can be implemented with either multiplication (by 1/ αi) or division (by αi)) operator conditionally depending on the outcome of the range result. For example, if the range result indicates that xi ≥ 0, then a multiplication/division would not be used, but if the range result indicates that xi < 0, then a multiplication/division operator would be used).
Henry et al., Koren et al., Pillai et al., and Xu et al. are analogous to the claimed invention because they are directed to implementing a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein one processing element is configured to use a target operator conditionally depending on the range result of the first operand in a previous-stage processing element as taught by Xu et al. to the disclosed invention of Henry et al. in view of Koren et al. and further in view of Pillai et al.
	One of ordinary skill in the art would have been motivated to make this modification in order to be able to use the disclosed invention of Henry et al. in view of Koren et al. to implement the leaky ReLU activation function because “three types of (modified) leaky ReLU [activation functions] all consistently outperform the original ReLU [activation function]” (Xu et al. Page 4, Conclusion).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Henry et al. (US 2017/0102920 A1) in view of Koren et al. (US 10,509,846 B2), in view of Pillai et al. (US 2019/0042922 A1), and further in view of Narayanaswami et al. (US 9,836,961 B1).
Regarding Claim 11,
Henry et al. in view of Koren et al. and further in view of Pillai et al. teaches the scalar element computing device of claim 10.
	Henry et al. further teaches wherein each entry comprises a command field to identify one or more22Docket No. DPAI-00200US register fields to indicate values of one or more operands for a selected operator (Fig. 16; [0127]: “In one embodiment, the memory array 1606 is configured in banks. When the NPUs 126 access the data RAM 122, all of the banks are activated to access an entire row of the memory array 1606; whereas, when the media registers 118 access the data RAM 122, only the specified banks are activated. In one embodiment, each bank is 128 bits wide and the media registers 118 are 256 bits wide, hence two banks are activated per media register 118 access” teaches that the partition memory arrays of the data RAM 122 are divided into banks (fields) for the media registers and NPUs to access data from).
	Henry et al. in view of Koren et al. and further in view of Pillai et al. does not appear to explicitly teach wherein each entry comprises a command field to identify a selected command and related control information, and one or more constant fields to indicate values of one or more operands for the selected operator.
	However, Narayanaswami et al. teaches wherein each entry comprises a command field to identify a selected command and related control information, and one or more constant fields to indicate values of one or more operands for the selected operator (Fig. 1; Col. 5 lines 11-13: “Storage medium 104 can include one or more memory banks or units, including first bank 112 for storing activations and second bank 114 for storing weights” teaches the memory divided into banks (fields) including a first bank 112 for activations (command/control information) and a second bank 114 for weights (constants)).
Henry et al., Koren et al., Pillai et al., and Narayanaswami et al. are analogous to the claimed invention because they are directed to implementing a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein each entry comprises a command field to identify a selected command and related control information, and one or more constant fields to indicate values of one or more operands for the selected operator as taught by Narayanaswami et al. to the disclosed invention of Henry et al. in view of Koren et al. and further in view of Pillai et al.
	One of ordinary skill in the art would have been motivated to make this modification because “Distribution of the encoded instructions to the various compute systems allows for increased computation bandwidth within a single system. Instruction quantity in a compute system is reduced because a single system is responsible only for a subset of the total computations needed” (Narayanaswami et al. Col. 3 lines 10-15).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Henry et al. (US 2017/0102920 A1) in view of Koren et al. (US 10,509,846 B2), in view of Pillai et al. (US 2019/0042922 A1), and further in view of Castelaz et al. (US 5,422,983 A).
Regarding Claim 12,
Henry et al. in view of Koren et al. and further in view of Pillai et al. teaches the scalar element computing device of claim 1.
	Koren et al. further teaches wherein one processing element only fetches one or more commands only when a first full sum is set (Fig. 2; Col. 5 line 59 – Col. 6 line 3: “The multiplexers then read from the output of the respective memory device 250 or 255 and retrieve the data set row (Val, Index) or col (Val, Index) at the outlet of the memory device 250 or 255. The indexes of row and column data sets are then compared. When the indexes match, the row data and column data are multiplied by the multiplier 235 and accumulated by the accumulator 240 and removed from the processing element. When the indexes do not match, the data set with the lowest index is discarded while the data set with the highest index is stored back in the memory device 250 or 255 from which it was read for use in future cycles” teaches that the processing elements only fetch data from the memory devices when the memory indexes match (i.e. when the first full sum is set). If the indexes do not match, the data is stored back in the memory devices for use in future cycles).
Henry et al., Koren et al., and Pillai et al. are analogous to the claimed invention because they are directed to implementing a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein one processing element only fetches one or more commands only when a first full sum is set as taught by Koren et al. to the disclosed invention of Henry et al. in view of Pillai et al.
	One of ordinary skill in the art would have been motivated to make this modification in order to “reduce power consumption while improving processing speed and performance” (Koren et al. Col. 16 lines 31-32).
	Henry et al. in view of Koren et al. and further in view of Pillai et al. does not appear to explicitly teach wherein an indication in a command field of each of the N command memories is used to instruct whether the following stages of one processing element fetch command or not.
	However, Castelaz et al. teaches wherein an indication in a command field of each of the N command memories is used to instruct whether the following stages of one processing element fetch command or not (Col. 13 line 42 - Col.14 line 23:“Neural network instructions. Neural network instructions cause the neuroengine to process the neural network specified in the working registers, configuration RAM stack and the weight RAM stacks … Continue neural network. The continue neural network instruction, instruction 7, causes the neuroengine to process the next sequential layer. The number of the currently processed layer(s) is pushed onto the stack. Once the layer is processed, the neuroengine fetches another instruction from the configuration RAM stack … Stop. The stop instruction, instruction 9, causes the neural engine to reset the RUN enable bit in the control register. This effectively stops execution of the neuro instruction stream. The value of all registers including the NPC and NSP are preserved. Execution of neuro instructions can be reinitiated by the host processor by setting the RUN enable bit in the control register. When the RUN enable bit is set, execution of neuro instruction(s) continues with the instruction immediately following the stop instruction” teaches that the neural network instructions (command field) include a continue instruction that has the next processing layer instruction be fetched, and a stop instruction that has no more instructions be fetched).
Henry et al., Koren et al., Pillai et al., and Castelaz et al. are analogous to the claimed invention because they are directed to implementing a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein an indication in a command field of each of the N command memories is used to instruct whether the following stages of one processing element fetch command or not as taught by Castelaz et al. to the disclosed invention of Henry et al. in view of Koren et al. by using the neural network instruction of Castelaz et al. as the command instructions for the pipelined NPUs of Henry et al. in view of Koren et al. and further in view of Pillai et al.
	One of ordinary skill in the art would have been motivated to make this modification because “the neuro programming language in accordance with the present invention, allows for flexibility in data processing” (Castelaz et al. Col. 15 lines 40 - 43). 

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Henry et al. (US 2017/0102920 A1) in view of Koren et al. (US 10,509,846 B2), in view of Pillai et al. (US 2019/0042922 A1), and further in view of Balasubramanian (US 2019/0102640 A1).
Regarding Claim 16,
Henry et al. in view of Koren et al. and further in view of Pillai et al. teaches the scalar computing subsystem of claim 15.
	Henry et al. in view of Koren et al. and further in view of Pillai et al. does not appear to explicitly teach further comprising a reduced operator pool coupled to all M scalar elements, wherein when a reduced operator is selected, each of the N processing elements in the M scalar elements provides a value for the reduced operator and uses a result of the reduced operator.
	However, Balasubramanian teaches further comprising a reduced operator pool coupled to all M scalar elements, wherein when a reduced operator is selected, each of the N processing elements in the M scalar elements provides a value for the reduced operator and uses a result of the reduced operator (Fig. 2; [0039]: “Referring to FIG. 2 is a further example in greater detail of the various computation layers making up the CNN 104, in particular. The CNN 104 can comprise a number of computational layers, including a convolution layer 202, a rectified linear unit (RELU) layer 204, a pooling layer 206, a fully connected (FC) layer 208 (artificial neural network layer), and an output layer 210. Although five computational layers are demonstrated, more or less computational layers can be envisioned as one of ordinary skill in the art could appreciate. A layer, or computation(al) layer, as used herein, can refer to one or more components that operate with similar function by mathematical or other functional means to process received inputs to generate/derive outputs for a next layer with one or more other components for further processing within a convolutional neural network system” teaches that the pooling layer 206 (reduced operator pool) is coupled to the convolution components 212 and ReLU components 214 (processing elements); [0049]: “The pooling components 216 are configured to generate pooling outputs via a pipelining process in parallel with the convolution combined layer 230. In an embodiment, pooling components 216 can initiate processing for scaled invariants and perform statistical operations on the first set/subset of convolutional data or the nonlinear convolutional output data. As such, the two different computational layers can be pipelined and operated in parallel with the functions of one another, or process concurrently as the sliding convolution window outputs a portion or some of the entirety of convolution data across an image 232, for example. The convolution/RELU operations of the convolution layer 230, the convolution components 212 or RELU components 214 operate as pipelined processes with the pooling components 216. These pooling components 216 perform statistical operations on the non-linear convolution output data based on a pooling window for a subset of the non-linear convolution output data” teaches that the pooling components 216 perform a pooling operation (reduce operation) on the values provided from the convolution components 212 and ReLU components 214 (processing elements)).
Henry et al., Koren et al., Pillai et al., and Balasubramanian are analogous to the claimed invention because they are directed to implementing a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate further comprising a reduced operator pool coupled to all M scalar elements, wherein when a reduced operator is selected, each of the N processing elements in the M scalar elements provides a value for the reduced operator and uses a result of the reduced operator as taught by Balasubramanian to the disclosed invention of Henry et al. in view of Koren et al. and further in view of Pillai et al.
	One of ordinary skill in the art would have been motivated to make this modification because “execution of convolution and the pooling layers can be pipelined with respect to one another, leading to significant speedup gains by enabling data flow processors to begin processing sectors of Convolution/RELU layer outputs immediately upon availability” (Balasubramanian [0029]).

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Henry et al. (US 2017/0102920 A1) in view of Koren et al. (US 10,509,846 B2), in view of Pillai et al. (US 2019/0042922 A1), in view of Balasubramanian (US 2019/0102640 A1), and further in view of Wang et al. (“DLAU: A Scalable Deep Learning Accelerator Unit on FPGA”).
Regarding Claim 17,
Henry et al. in view of Koren et al., in view of Pillai et al., and further in view of Balasubramanian teaches the scalar computing subsystem of claim 16.
	Balasubramanian further teaches wherein the reduced operator pool comprises a minimum operator and a maximum operator ([0073]: “The statistical operation performed by the pooling components 216 can comprise a minimum, a maximum, an average, a median, or other statistical operation (e.g., of A, B, I, J) of a window of the sliding convolutions” teaches that the pooling operation (reduction operation) can comprise minimum and maximum operations). 
Henry et al., Koren et al., Pillai et al., and Balasubramanian are analogous to the claimed invention because they are directed to implementing a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the reduced operator pool comprises a minimum operator and a maximum operator as taught by Balasubramanian to the disclosed invention of Henry et al. in view of Koren et al. and further in view of Pillai et al.
	One of ordinary skill in the art would have been motivated to make this modification because “execution of convolution and the pooling layers can be pipelined with respect to one another, leading to significant speedup gains by enabling data flow processors to begin processing sectors of Convolution/RELU layer outputs immediately upon availability” (Balasubramanian [0029]).
	Henry et al. in view of Koren et al., in view of Pillai et al., and further in view of Balasubramanian does not appear to explicitly wherein the reduced operator pool comprises an addition operator.
	However, Wang et al. teaches wherein the reduced operator pool comprises an addition operator (Fig. 2 teaches a schematic for a Tiled Matrix Multiplication Unit (TMMU) capable of performing a reduced addition operation on input values; Page 3, Col. 2 Paragraph 2: “For the calculation, we use pipelined binary adder tree structure to optimize the performance” teaches that the TMMU uses a binary adder tree to perform the addition operation).
Henry et al., Koren et al., Pillai et al., Balasubramanian, and Wang et al. are analogous to the claimed invention because they are directed to implementing a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the reduced operator pool comprises an addition operator as taught by Wang et al. to the disclosed invention of Henry et al. in view of Koren et al., in view of Pillai et al., and further in view of Balasubramanian.
	One of ordinary skill in the art would have been motivated to make this modification in order to implement the addition operation for the pooling components of Balasubramanian since “the pooling components 216 could perform pooling operations of any statistical operation” (Balasubramanian [0073]) as well as to take advantage of “the DLAU architecture (the TMMU is part of the DLAU architecture) [which] can be configured to operate different sizes of tile data to leverage the trade-offs between speedup and hardware costs. Consequently, the FPGA based accelerator is more scalable to accommodate different machine learning applications” (Wang et al. Page 1 last paragraph – Page 2 first paragraph).

Claim 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Henry et al. (US 2017/0102920 A1) in view of Koren et al. (US 10,509,846 B2), in view of Pillai et al. (US 2019/0042922 A1), and further in view of Mills (US 2019/0340489 A1).
Regarding Claim 18,
Henry et al. in view of Koren et al. and further in view of Pillai et al. teaches the scalar computing subsystem of claim 15.
	Henry et al. in view of Koren et al. and further in view of Pillai et al. does not appear to explicitly teach further comprising an aligner coupled to all M scalar elements to align first data output from all M scalar elements.
	However, Mills teaches further comprising an aligner coupled to all M scalar elements to align first data output from all M scalar elements (Fig. 9A; Fig. 9B; Fig. 10; [0081]: “FIGS. 9A and 9B illustrate an example multiply-accumulator (MAC) 404 that supports neural networking operations using fixed-point and floating-point operands operating in a fixed-point mode of operation, according to one embodiment” teach multiple accumulate circuits (MACs) that can be used to convert data of multiple data types (e.g. fixed-point and floating-point) from neural network operations to the same data type for use in subsequent operations. Furthermore, [0100]: “The binary sum 928 value is used to drive the shift register 910 in order to align the multiplied value 912 for addition with the (e.g., 32-bit) fixed-point accumulated value 930 in the accumulator 414. As illustrated in FIG. 10, the 23-bit multiplied value 912 is shifted according to the amount of precision and/or range needed to support addition and accumulation operations. The shift register 910 performs arithmetic shifts on the 23-bit multiplied value 912 to maintain its sign while aligning the 23-bit multiplied value 912 using the binary point 924” teaches that the data input (23-bit floating point in this example) into the MAC is first shifted (aligned) in order to be used in calculations with a 32-bit fixed-point data value).
Henry et al., Koren et al., Pillai et al., and Mills are analogous to the claimed invention because they are directed to implementing a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate further comprising an aligner coupled to all M scalar elements to align first data output from all M scalar elements as taught by Mills to the disclosed invention of Henry et al. in view of Koren et al. and further in view of Pillai et al.
	One of ordinary skill in the art would have been motivated to make this modification in order to be able to use multiple data types for neural network operations (e.g. fixed-point and floating-point) in the neural network unit (Mills [0001]). 
Regarding Claim 19,
Henry et al. in view of Koren et al., in view of Pillai et al., and further in view of Mills teaches the scalar computing subsystem of claim 18.
	Mills further teaches further comprising a padder coupled to the aligner to pad second data output from the aligner (Mills Fig. 9B; Fig. 10; [0100]: “The shift register 910 performs arithmetic shifts on the 23-bit multiplied value 912 to maintain its sign while aligning the 23-bit multiplied value 912 using the binary point 924. The most-significant bits (MSB) are sign extended on the left, and remaining bits are padded with zeros on the right” teaches that padding occurs on the data after aligning).
Henry et al., Koren et al., Pillai et al., and Mills are analogous to the claimed invention because they are directed to implementing a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate further comprising a padder coupled to the aligner to pad second data output from the aligner as taught by Mills to the disclosed invention of Henry et al. in view of Koren et al. and further in view of Pillai et al.
	One of ordinary skill in the art would have been motivated to make this modification in order to be able to use multiple data types for neural network operations (e.g. fixed-point and floating-point) in the neural network unit (Mills [0001]). 

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Henry et al. (US 2017/0102920 A1) in view of Koren et al. (US 10,509,846 B2), in view of Pillai et al. (US 2019/0042922 A1), and further in view of Donohoe (US 6,883,084 B1).
Regarding Claim 21,
Henry et al. in view of Koren et al. and further in view of Pillai et al. teaches the scalar computing subsystem of claim 15.
	Henry et al. in view of Koren et al. and further in view of Pillai et al. does not teach wherein the interface module comprises a multiplexer to select the input data from output data of a full sum calculation unit or looped-back outputs from last-stage processing elements in each scalar element.
	However, Donohoe teaches wherein the interface module comprises a multiplexer to select the input data from output data of a full sum calculation unit or looped-back outputs from last-stage processing elements in each scalar element (Fig. 5; Col. 7 lines 29-48: “FIG. 5 illustrates an overview of an array of configurable processing elements within an RDPP 220 according to the preferred embodiment of the present invention. The processing elements 225-231 are coupled linked to each other, and to input and output buffers 221, 237 through a flexible switching network. The input data 219 is coupled to the input data buffer 221, and is directed by the input select logic 223 into an input of a selected processing element or multiple select processing elements 225-231, depending upon operational requirements. After processing, the output of the selected operating processing element or multiple processing elements 225-231 is directed variously to the output select logic 233 and the input select logic 223. If processing is completed, the processing element output is routed to the output data buffer 237. If the output of a processing element 225-231 is to be further processed, the output data from the processing element 225-231 is routed back to the input select logic 223, from which it is directed to the input of another one or multiple processing element 225-233” teaches input selection logic 223 (interface module) that receives input data from memory via input data buffer 221. Furthermore, input selection logic 223 selects input data from memory or output data from the pipelined processing elements to provide to the processing elements for calculations).
Henry et al., Koren et al., Pillai et al., and Donohoe are analogous to the claimed invention because they are directed to implementing a neural network.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the interface module comprises a multiplexer to select the input data from output data of a full sum calculation unit or looped-back outputs from last-stage processing elements in each scalar element as taught by Donohoe to the disclosed invention of Henry et al. in view of Koren et al. and further in view of Pillai et al.
	One of ordinary skill in the art would have been motivated to make this modification because “the present invention speeds up processing time by reducing the looping and retracing of branches which commonly attends programming features in the prior art” (Donohoe Col. 7 lines 21-24).
Response to Arguments
Applicant’s arguments filed 05/06/2022, with respect to the drawing objection in regards to reference character 220, the drawing objection in regards to reference character 240 in paragraph [0013] and reference character 420-(M-1) in paragraph [0069], the drawing objection in regards to reference character 420-(N-1) in Fig. 4 have been fully considered and are persuasive.  Therefore, the objections have been withdrawn.  However, applicant's arguments filed on 05/06/2022 with respect to the drawing objection regarding reference character 195 in Fig. 1C have been fully considered but they are not persuasive. Applicant asserts “In Fig. 1 C, reference 195 refers to "yj', where "yj' is the output at the hidden layer as described in [0003] and equation (2). Therefore, reference 195 should be mentioned in paragraph [0005] along with the output of each neuron, as "The output 195 of each neuron may become multiple inputs for the next-stage neural". Accordingly, the specification for paragraph [0005] is also amended to correct the missing reference sign.” 
Examiner’s Response:
	The examiner respectfully disagrees. The amended specification for paragraph [0005] does not recite the amendment of “The output 195 of each neuron may become multiple inputs for the next-stage neural” as indicated in applicant’s arguments. Therefore, the specification does not mention reference character 195 that is used in Fig. 1C.

Applicant's arguments filed on 05/06/2022 with respect to the claim interpretation in claim 7 have been fully considered but they are not persuasive. Applicant asserts “The examiner states "This application includes one or more claim limitations that do not use the word "means," but are nonetheless being interpreted under 35 U.S.C. l 12(f) or pre-AJA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: Claim 7". Claim 7 has been cancelled.”
Examiner’s Response: 
	The examiner respectfully disagrees. The amended claims filed on 05/06/2022 include an amended claim 7, meaning that claim 7 is not cancelled as indicated in applicant’s arguments. Therefore, the claim interpretation for claim 7 as indicated in this office action is maintained.

Applicant’s arguments filed 05/06/2022, with respect to the objection(s) of claim(s) 1-13 and 15-21 have been fully considered and are persuasive.  Therefore, the objection has been withdrawn.  However, upon further consideration, a new ground of objection is made for claims 1-13 and 15-21 in view of the claim amendments filed 05/06/2022. Please see current objection for more information.

Applicant’s arguments filed 05/06/2022, with respect to the rejection(s) of claim(s) 1-13 and 15-21 under 35 U.S.C. 112(b) have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection for claims 1-13 and 15-21 is made in view of the claim amendments filed 05/06/2022. Please see current rejection for more information.

Applicant’s arguments with respect to the 35 U.S.C. 103 obviousness rejection to claims 1-13 and 15-21 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN J HALES whose telephone number is (571)272-0878. The examiner can normally be reached M-Th 8:00am - 5:00pm and F 8:00am - 2:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/BRIAN J HALES/Examiner, Art Unit 2125                                                                                                                                                                                                        
/ALAN CHEN/Primary Examiner, Art Unit 2125