DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-21 are presented for examination.

Response to Amendment
Applicant’s amendment has obviated most, but not all, of the objections to the specification and drawings given in the last Office Action.  To the extent that an objection or rejection appears in the previous Office Action(s) but not this Office Action, that objection or rejection is withdrawn.  To the extent that is appears both in a previous Office Action(s) and this Office Action, the objection or rejection is maintained.
Applicant has also convinced examiner that there is sufficient structure in the specification to perform the entire claimed functions of claims 9 and 20.  Therefore, the rejection of those claims under 35 USC § 112(b) is withdrawn.  However, the underlying interpretation of the claims under 35 USC § 112(f) is not withdrawn, because both claims continue to use the nonce term “unit”; modify the nonce term with functional language (identifying a weight based on an origin address or tag); and fail to modify the function with a structure for performing it.  For purposes of examination, any circuitry that determines which weight to associate with which output based on an address specifying the node from which the output originates will be deemed to read on the claims.  As Examiner has repeatedly stated, to the extent that a means-plus-function interpretation of the claims is not intended, Applicant should amend the claims to eliminate that interpretation.  In that regard, note that the Federal Circuit has found that the term “circuit” does not invoke 35 USC § 112(f).  MPEP § 2181(I)(A).  Thus, to the extent consistent with the specification, changing all recitations of a “unit” or “element” to “circuit” or some other physical hardware would likely eliminate the interpretation.

Information Disclosure Statement
The information disclosure statements (IDS) submitted on December 25, 2021 and March 29, 2022 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.

Drawings
The drawings are objected to because in Fig. 8B, “back propagation to” should be “backpropagation on”.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Specification
The disclosure is objected to because of the following informalities: in paragraph 91, “subset … are” should be “subset … is” (two instances).  Appropriate correction is required.

Claim Interpretation
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are:
"first processing unit", "first output systolic element", "second processing unit'', ''third processing unit", and ''first output systolic element" in claim 1;
''first output systolic element" in claim 2;
"second output systolic element" in claim 3;
“third processing unit" and "second output systolic element” in claim 4;
"subset of the first plurality of processing units" and "subset of the second plurality of processing units" in claim 5;
"the second processing unit" in claim 9;
''first processing unit” and "second processing unit" in claim 18;
"second processing unit" in claim 20;
"subset of the first processing units'' and "subset of the second processing units'' in claim 21.
Because this/these claim limitation(s) is/are being interpreted under 3.5 U.S,C 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification forming the claimed function, and equivalents thereof.
Regarding "first processing unit" and "second processing unit' in claim 1; "third processing unit" in claim 2; "third processing unit" in claim 4; "subset of the first plurality of processing units" and "subset of the second plurality of processing units" in claim 5; "first processing unit" and "second processing unit" in claim 18; "subset of the first processing units" and "subset of the second processing units" in claim 21, Examiner is interpreting the limitations as including a hardware processor (para. 0045) programmed to perform computation of a corresponding node of a corresponding layer by receiving inputs, using a weight summation function and calculating an activation function (linear or nonlinear), as described in Fig. 2 and para. [00196].
Regarding “first output systolic element” in claim 1; “second output systolic element” in claim 2; “second output systolic element” in claim 3; “first output systolic element” in claim 4; and “the first arrangement of first processing units” in claim 21, Examiner is interpreting each element as a hardware processor (para. 0045) programmed to transfer data packets through nodes/layers in intervals, as described in paras. [0009, 0205] and Figs. 4A – 4C.
For an analysis of claims 9 and 20, see Response to Amendment section supra.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
Claims 1, 4-6, 10-11, 13-16, 18 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over US 5799134 A by Chiueh et al, (hereinafter, “Chiueh”), in view of US 9710748 B2 by Ross et al., (hereinafter, “Ross”) and further in view of US 20180157465 A1 by Bittner et al. (hereinafter, “Bittner”) and US 5138695 A by Means et al. (hereinafter, “Means”).
Regarding claim 1, Chiueh teaches a device for performing computations of a neural network comprising at least a first layer and a second layer, the device comprising: (Chiueh Fig. 1 and Col 1 (line 15-20) disclose a neural network with multiple layers (i.e. a first layer and a second layer))
	a first processing unit configured to perform computations of a first node of the first layer of the neural network, the first processing unit including: (Chiueh Fig. 1, 3 and Col 1 (line 15-25), Col 3 (line 5-12) disclose a system comprising M processing elements to implement a multi-layer neural network. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weights. Each processing element is known as a neuron. So the 1st processing element (i.e. a first processing unit of the first layer of the 1st layer of the neural network) can generate an activation output (i.e. perform computation) of the 1st node of the 1st layer using an input, weighted sum and activation function.)
a first input systolic element; (Chiueh Fig. 3 and Col 3 (line 5-10) disclose that each processing element also includes a processor (i.e. first input systolic element) for receiving a sequence of inputs for outputting an accumulated value) ...
a first output systolic element configured to receive the first … output from the first processing circuitry; (Chiueh Fig. 3 and Col 4 (line 40-45) disclose that each processing element PE-i contains a storage element 106-i. Each storage element 106-i (i.e.  first output systolic element) stores a value g.sub.i (i.e. first output) received from the corresponding processor 104-i. (i.e. the first processing circuitry)) 
	a second processing unit configured to perform computations of a second node of the second layer of the neural network, wherein the second processing unit includes a second input systolic element, (Chiueh Fig. 1,3 and Col 1 (line 15-25), Col 3 (line 5-12) disclose a system comprising M processing elements to implement a multi-layer neural network. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weights. Each processing element is known as a neuron. So the 2nd processing element (i.e. a second processing unit of the second layer of the neural network) can generate an output (i.e. perform computation) of the 2nd node of the 2nd layer using input.  See also Fig. 3 and Col 3 (line 5-10), which disclose that each processing element also includes a processor (i.e. first input systolic element) for receiving a sequence of inputs for outputting (i.e. performing computation of) an accumulated value) 
	Chiueh fails to explicitly teach the remaining limitations of the claim.  However, Ross teaches first processing circuitry configured to receive data from the first input systolic element and perform processing according to the first node to generate a first activation output …. (Ross fig. 1 (106, 108), fig 2 (212, 214) and col 1 (line 31-46), Col 7 (line 17-27) disclose that a matrix computation unit receives a weight input and activation inputs (i.e. data) from a unified buffer (i.e. first input systolic element) and generates (i.e. perform processing according to the first node) an accumulated value. A vector computation unit receives the accumulated values and applies an activation function to generate (i.e. perform processing according to the first node) activation values (i.e. first activation output). Here, the matrix computation unit is configured to receive inputs from a unified buffer (i.e. first input systolic element) and then the matrix computation unit and vector computation unit (i.e. first processing circuitry) together process the inputs to generate an activation output for the first node.)  
	wherein the first output systolic element is further configured to systolically pulse the first activation output … to the second input systolic element. (Ross fig. 2 and Col 3 (line 36-44), col 4 (line 34-50), col 5 (line 65) – col 6(line 4) disclose that the output (i.e. first activation output) from one layer (i.e. first output systolic element) can be provided as input to another layer (i.e. the second input systolic element) in intervals using the clock signal. (i.e. systolically pulse))
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Chiueh’s One Dimensional Systolic Array Architecture for Neural Network and Ross’s Neural Network Processor, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would provide the output from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ross col 4 (line 34-50) 
	Neither Chiueh nor Ross appears to disclose explicitly the further limitations of the claim.  However, Bittner discloses systolically puls[ing] the first activation output directly to the second input systolic element. (Bittner paragraph 68 and Fig. 2 disclose a system with a plurality of neural processor cores connected to other cores by an interconnect; Fig. 1 and paragraph 44 disclose that each core may contain an additional function element such as an activation function to produce a function f(R + B), where R is a result vector mantissa and B is a bias, and the result is passed through an output mantissa shifter that produces a final result vector [activation output] by aligning the elements of f(R + B) with an output exponent; see also Fig. 1, reference character 120 (showing a vector input 120 in each core; in other words, the vector result 155 of each core may be fed directly into the vector input 120 of another core connected by the interconnect))
Bittner and the instant application both relate to physical implementations of neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Chiueh and Ross to pulse the activation output directly to a second processing element, as disclosed by Bittner, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would increase efficiency by ensuring that the outputs do not need to be passed on to an intermediate module before being sent to the next element.  See Bittner, paragraph 44 and Fig. 1.
Means discloses a third processing unit configured to perform computations of a third node of the first layer of the neural network, the third processing unit including a second output systolic element (in a feedforward, fully connected neural network mode of a systolic array, each processing element in the top row of the array can represent a single neuron in a fully connected layer of the feedforward network – Means, col. 5, l. 53-col. 6, l. 17; see also Fig. 1 [note that the second PE 9 of the top row is the third processing unit, from the same layer as the first PE 9 of the row, and that it contains an accumulator 17 and a bus 19, collectively comprising a second output systolic element]),
wherein the first output systolic element is further configured … systolically [to] pulse the first activation output directly to the second output systolic element (each processing element performs, inter alia, the accumulation of the 16-bit result of a multiplier in an accumulator and the placement of the result of the accumulator on a 28-pin output bus – Means, col. 2, l. 57-col. 3, l. 4; see also Fig. 1 [showing that the output 19 of accumulator 17 is connected to the output of the next processing element via a bus; since each PE’s accumulator plus the bus can be regarded as an “output systolic element,” the system pulses the output of one PE directly to the output of another PE]).
Means and the instant application both relate to systolic arrays for neural network processing and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Chiueh, Ross, and Bittner such that the system pulses an activation output directly to a second output element corresponding to a node of the same layer, as disclosed by Means, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the other processing unit to become aware of the output of the first processing unit for easy accumulation of the outputs.  See Means, col. 2, l. 57-col. 3, l. 4 and Fig. 1.

Regarding claim 4, the combination of Chiueh, Means, Ross and Bittner as shown above teaches the device of Claim 1. 
Chiueh teaches a fourth processing unit configured to perform computations of a fourth node of the second layer of the neural network (Chiueh Fig. 1, 3 and Col 1 (line 15-25), Col 3 (line 5-12) disclose a system comprising M processing elements to implement a multi-layer neural network. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weights.  Each processing element is known as a neuron. So the fourth processing element (i.e. a fourth processing unit of the second layer of the neural network) can generate an activation output (i.e. perform computation) of the fourth node of the 2nd layer using the input, weighted sum and activation function.)
	the fourth processing unit including a third input systolic element (Chiueh Fig. 3 and Col 3 (line 5-10) disclose that each processing element also includes a processor (i.e. third input systolic element)  for receiving a sequence of inputs for outputting an accumulated value).
	Ross teaches that the first output systolic element is further configured to systolically pulse the first activation output to the third input systolic element (Ross fig. 2 and Col 3 (line 36-44), col 4 (line 34-50), col 5 (line 65) – col 6 (line 4) disclose that the output (i.e. first activation output) from one layer (i.e. first output systolic element) can be transferred as input to another layer (i.e. the third input systolic element) in intervals using the clock signal. (i.e. systolically pulse)).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Chiueh’s One Dimensional Systolic Array Architecture for Neural Network, Ross’s Neural Network Processor, and Bittner/Means.  Doing so would provide the output from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ross col 4(line 34-50)).  	

Regarding claim 5, the combination of Chiueh, Bittner, Means, and Ross as shown above teaches the device of Claim 1.
	Chiueh teaches a first arrangement of a first plurality of processing units including the first processing unit, wherein at least a subset of the first plurality of processing units is configured to perform computations of a corresponding number of nodes of the first layer of the neural network (Chiueh fig. 1, 3 and col 1 (line 15-25), col 3 (line 5-12) disclose a system comprising M processing elements to implement a multi-layer neural network. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weights. Each processing element is known as a neuron. The 1st processing element / PE1, 2nd processing element/PE2, ……..Mth processing elements/PEM (i.e. a first arrangement of a first plurality of processing units including the first processing unit of first layer of the neural network) can generate activation outputs (i.e. perform computation) for corresponding neurons (i.e. corresponding number of nodes) of the 1st layer using the input, weighted sum and activation function.)
	a second arrangement of a second plurality of processing units including the second processing unit, wherein at least a subset of the second plurality of processing units is configured to perform computations of a corresponding number of nodes of the second layer of the neural network (Chiueh fig. 1, 3 and col 1 (line 15-25), col 3 (line 5-12) disclose a system comprising M processing elements to implement a multi-layer neural network. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weights.  Each processing element is known as a neuron. The 1st processing element / PE1, 2nd processing element/PE2, ……..Mth processing elements/PEM (i.e. a second arrangement of a second plurality of processing units including the second processing unit of the second layer of the neural network) can generate activation outputs (i.e. perform computation) for corresponding neurons (i.e. corresponding number of nodes) of the 2nd layer using the input, weighted sum and activation function.)
	Ross teaches a crossover connection between an output systolic element of one of the first plurality of processing units and an input systolic element of one of the second plurality of processing units (Ross col 3 (line 36-44) discloses that a neural network with multiple layers can be connected to compute interferences. Output from one layer (i.e. output systolic element of one of the first plurality of processing units) is provided as input (i.e. crossover connection) to a next layer (i.e. input systolic element of one of the second plurality of processing units))
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Chiueh’s One Dimensional Systolic Array Architecture for Neural Network, Bittner, Means, and Ross’s Neural Network Processor.  Doing so would provide the output from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ross col 4 (line 34-50)).    

	Regarding claim 6, the combination of Chiueh, Bittner, Means, and Ross as shown above teaches the device of Claim 1.
	Ross teaches that the device further includes a systolic processor chip, and … the first and second processing units comprise circuitry embedded in the systolic processor chip (Ross col 1(line 15-16), col 2(line 51-64) disclose integrating components (i.e. wherein the first and second processing units comprise circuitry) of the neural network processor into one circuit as hardware implementation (i.e. the systolic processor chip) to avoid off-chip communication.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Chiueh’s One Dimensional Systolic Array Architecture for Neural Network, Bittner, Means, and Ross’s Neural Network Processor.  Doing so would improve efficiency (e.g., increase speed and throughput and reduce power and cost, over implementations in software) of the neural network. (Ross col 2 (line 51-64)).

Regarding claim 10, Chiueh teaches a method for performing computations of a neural network comprising at least a first layer and a second layer, the method comprising: (Chiueh Fig. 1 and Col 1 (line 15-20) discloses a neural network with multiple layers (i.e. a first layer and a second layer))
	assigning a first data processing unit (DPU) to perform computations of a first node of the first layer of the neural network (Chiueh Fig. 1, 3 and Col 1 (line 15-25), Col 3 (line 5-12) discloses a system comprising M processing elements to implement a multi-layer neural network. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weights.  Each processing element is known as a neuron. So the 1st processing element (i.e. a first data processing unit (DPU) of the first layer of the 1st layer of the neural network) can generate output (i.e. perform computation) of the 1st node of the 1st layer.)  
	assigning a second DPU to perform computations of a second node of the second layer of the neural network (Chiueh Fig. 1, 3 and Col 1 (line 15-25), Col 3 (line 5-12) disclose a system comprising M processing elements to implement a multi-layer neural network... The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weights.  Each processing element is known as a neuron. So the 2nd processing element (i.e. a second DPU of the second layer of the neural network) can generate an output (i.e. perform computation) of the 2nd node of the 2nd layer); … [and]
	transmitting the first … output to a first output systolic element of the first DPU. (Chiueh Fig. 3 and Col 4 (line 40-45) discloses that each processing element PE-i contains a storage element 106-i. Each storage element 106-i (i.e. first output systolic element) receives (and transmits) a value g.sub.i (i.e. first …output) from the corresponding processor 104-i.)
	Chiueh fails to explicitly teach the remaining limitations of the claim.  However, Ross teaches performing computations of the first node of the first layer using [a] DPU to generate a first activation output (Ross Fig 1 (106, 108), fig. 2 and col 1 (line 31-46), Col 7 (line 17-27) discloses a circuit (i.e. DPU) that comprises a matrix computation unit, a vector computation unit and unified buffer. The matrix computation unit receives a weight input and activation inputs from the unified buffer and generates (i.e. performs calculation of) an accumulated value and the vector computation unit receives accumulated values and applies an activation function to generate (i.e. calculate) activation values (i.e. first activation output)).
	systolically pulsing the first activation output from the first output systolic element … to a first input systolic element of the …DPU during a first systolic pulse (Ross fig. 2 and col 3 (line 36-44), col 4 (line34-50) disclose that an output (i.e. first activation output) from one layer (i.e. first output systolic element) can be transferred (i.e. systolically pulse) as input to another layer (i.e. second input systolic element of the DPU) in intervals using the clock signal (i.e. first systolic pulse)).  
	and performing computations of the second node ... by using the …DPU to process at least the first activation output, wherein the method is performed by at least one processor. (Ross Col 3 (line 36-44) discloses that a neural network with multiple layers can be connected to compute interferences. Output from one layer (i.e. first activation output) is provided as input to next layer. See also Ross Fig 1 (106, 108), fig. 2 and col 1 (line 31-46), col 7 (line 17-27), which disclose that a circuit (i.e. DPU) comprises a matrix computation unit, a vector computation unit and unified buffer. The matrix computation unit receives a weight input and activation input (i.e. first activation output) from the unified buffer and generates (i.e. performing calculation) an accumulated value and the vector computation unit receives accumulated values and applies an activation function to generate (i.e. perform calculation of) activation values. See also Ross col 2 (line 51-64), which discloses that implementing a neural network processor (i.e. processor) in hardware improves efficiency, e.g., increases speed and throughput and reduces power and cost, over implementations in software)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Chiueh’s One Dimensional Systolic Array Architecture For Neural Network and Ross’s Neural Network Processor.  Doing so would provide the output from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ross col 4(line 34-50)).  
	Neither Chiueh nor Ross appears to disclose explicitly the further limitations of the claim.  However, Bittner discloses systolically puls[ing] the first activation output … directly to a first input systolic element of [a] second DPU…. (Bittner paragraph 68 and Fig. 2 disclose a system with a plurality of neural processor cores connected to other cores by an interconnect; Fig. 1 and paragraph 44 disclose that each core may contain an additional function element such as an activation function to produce a function f(R + B), where R is a result vector mantissa and B is a bias, and the result is passed through an output mantissa shifter that produces a final result vector [activation output] by aligning the elements of f(R + B) with an output exponent; see also Fig. 1, reference character 120 (showing a vector input 120 in each core; in other words, the vector result 155 of each core may be fed directly into the vector input 120 of another core connected by the interconnect))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Chiueh and Ross to pulse the activation output directly to a second processing element, as disclosed by Bittner, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would increase efficiency by ensuring that the outputs do not need to be passed on to an intermediate module before being sent to the next element.  See Bittner, paragraph 44 and Fig. 1.
Means discloses “assigning a third DPU to perform computations of a third node of the first layer of the neural network (in a feedforward, fully connected neural network mode of a systolic array, each processing element in the top row of the array can represent a single neuron in a fully connected layer of the feedforward network – Means, col. 5, l. 53-col. 6, l. 17; see also Fig. 1 [note that the second PE [DPU] 9 of the top row corresponds to the third node, from the same layer as the first PE 9 of the row]); … [and]
systolically pulsing the first activation output from the first output systolic element directly to a second output systolic element of the third DPU during the first systolic pulse (each processing element performs, inter alia, the accumulation of the 16-bit result of a multiplier in an accumulator and the placement of the result of the accumulator on a 28-pin output bus – Means, col. 2, l. 57-col. 3, l. 4; see also Fig. 1 [showing that the output 19 of accumulator 17 is connected to the output of the next processing element via a bus; since each PE’s [DPU’s] accumulator plus the bus can be regarded as an “output systolic element,” the system pulses the output of one PE directly to the output of another PE; first output systolic element = accumulator + bus of first PE of top row; second output systolic element = accumulator + bus of second PE of top row])….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Chiueh, Bittner, and Ross to pulse data directly from the output of one unit to the output of another unit representing a node of the same layer, as disclosed by Means, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the other processing unit to become aware of the output of the first processing unit for easy accumulation of the outputs.  See Means, col. 2, l. 57-col. 3, l. 4 and Fig. 1.

Regarding claim 11, the combination of Chiueh, Bittner, Means, and Ross as shown above teaches the method of Claim 10.
	Chiueh further teaches that … the first … output [is output] through a plurality of input systolic elements of a corresponding plurality of DPUs assigned to perform computations of the … layer (Chiueh Fig. 1, 3 and Col 1 (line 15-25), Col 3 (line 5-12) disclose a system comprising M processing elements (i.e. plurality of DPUs) to implement a multi-layer neural network. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weights, and a processor (i.e. plurality of input systolic elements) for receiving input (i.e. first output) for outputting an accumulated value)
	Ross further teaches systolically pulsing the first activation output …to perform computations of the second layer (Ross fig. 2 and col 3 (line 36-44), col 4 (line 34-50), col 5 (line 66) – col 6 (line 4) disclose that a neural network with multiple layers can be connected to compute interferences. Output (i.e. the first activation output) from one layer is provided in intervals using the clock signal (i.e. systolically pulsed) as input to next layer.  See also (Ross fig. 1 (106, 108), fig 2(212, 214) and col 1 (line 31-46), which disclose that the matrix computation unit receives a weight input and activation inputs of a previous layer (i.e. first activation output) from a unified buffer and generates (i.e. performs computations of the second layer) an accumulated value. A vector computation unit receives accumulated values and applies an activation function to generate (i.e. perform computations of the second layer) activation values).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Chiueh’s One Dimensional Systolic Array Architecture for Neural Network, Bittner, Means, and Ross’s Neural Network Processor, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would provide the output from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ross col 4 (line 34-50)).  

	Regarding claim 13, the combination of Chiueh, Bittner, Means, and Ross as shown above teaches the method of Claim 11.
	Ross further teaches that the computations of the second node include a multiplication of the first activation output pulsed to the first input systolic element with a weight. (Ross fig. 2 (212) and col 3 (line 36-44), col 3 (line 66) – col 4 (line 9), col 4 (line 36-50), col 5 (line 65) – col 6 (line 4) disclose that a matrix multiplication unit can multiply a weight input (i.e. weight) with an activation input (i.e. the first activation output pulsed to the first input systolic element) and sum the products together to form an accumulated value)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Chiueh’s One Dimensional Systolic Array Architecture for Neural Network, Bittner, and Ross’s Neural Network Processor, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would provide the output from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ross col 4(line 34-50)).  

	Regarding claim 14, the combination of Chiueh, Bittner, Means, and Ross as shown above teaches the method of Claim 13. 
	Chiueh further teaches that the weight is stored locally at the second DPU (Chiueh fig 3 (102 -1) and Col 3 (line 5-10) disclose that the one dimensional systolic array comprises M processing elements (PE's). The i.sup.th processing element, i=1, 2, . . . , M comprises a weight storage circuit for storing (i.e. weight is stored locally at the second node) a sequence of synaptic weights).

	Regarding claim 15, the combination of Chiueh, Bittner, Means, and Ross as shown above teaches the method of Claim 13.
	Ross further teaches that the weight is retrieved from a memory external to the second DPU. (Ross col. 4, ll. 51-58 disclose that a direct memory access stores sets of weights in a dynamic memory; col. 4, l. 66-col. 5, l. 8 disclose that the dynamic memory sends the sets of weight inputs to the matrix computation unit; see also Fig. 2 (showing that the dynamic memory 210 is external to the matrix computation unit 212 containing the processing nodes))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Chiueh and Bittner to retrieve the weights from an external memory, as disclosed by Ross, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would increase efficiency by decreasing the number of functions the nodes themselves must perform.  See Ross, col. 4, ll. 34-50 and Fig. 2.

Regarding claim 16, the combination of Chiueh, Bittner, Means, and Ross as shown above teaches the method of Claim 13.
Ross further teaches that the multiplication is performed by a feedback convolution engine, the method further comprising feeding the multiplied first activation output back into the feedback convolution engine during processing of another activation output. (Ross Fig 3, 4 and col 3 (line 66) – col 4 (line 9) disclose that a matrix multiplication unit (i.e. feedback convolution engine) can multiply a weight input with an activation input and sum the products together to form an accumulated value. See also col 6 (line 5-25), which discloses that the matrix multiplication unit includes accumulator units that store accumulated output (i.e. the multiplied first activation output) from each column when performing calculations. The accumulator units can accumulate each accumulated output to generate a final accumulated value. The final accumulated value can be transferred to a vector computation unit. See also fig. 3 and col 6 (line 5-25), which disclose that while a cell can process an activation input and send the processed value to the accumulator unit, another cell can process another activation input (i.e. processing of another activation output.  See also Fig. 2, which shows that the results of the vector computation unit are sent to a unified buffer, whose contents are then fed back into the matrix computation unit, whose results are in turn sent to the vector computation unit.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Chiueh’s One Dimensional Systolic Array Architecture for Neural Network, Bittner, and Ross’s Neural Network Processor, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would ensure efficiency of the system. (Ross col 6 (line 5-25))

Regarding claim 18, Chiueh teaches performing, using a first processing unit, computations of a first node of a neural network to generate a first … output, the first node included in a first layer of the neural network (Chiueh Fig. 1, 3 and Col 1 (line 15-20), Col 3 (line 5-12) disclose a system comprising M processing elements to implement a multi-layer neural network.. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weights.  Each processing element is known as a neuron. So the 1st processing element (i.e. a first processing unit of the first node of the first layer of the neural network) can generate (i.e. perform computations of a first node) output (i.e. first … output) for the 1st node of the 1st layer.)
	a second processing unit assigned to perform computations of a second node of the neural network, the second node included in a second layer of the neural network (Chiueh Fig. 1, 3 and Col 1 (line 15-20), Col 3 (line 5-12) disclose s system comprising M processing elements to implement a multi-layer neural network. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weights.  Each processing element is known as a neuron. So the 2nd processing element (i.e. second processing unit of the second layer of the neural network) can generate an activation output (i.e. computations of a second node of the neural network) for the 2nd node of the 2nd layer.)
and performing computations of the second node by using the second processing unit to process at least the first … output to generate a second … output (Chiueh fig. 1, 3 and col 1 (line 15-20), col 3 (line 5-12) disclose a system comprising M processing elements to implement a multi-layer neural network.. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weights.  Each processing element is known as a neuron. So the 2nd processing element (i.e. second processing unit of the second layer of the neural network) can generate (i.e. computations of a second node of the neural network) output (i.e. second … output) for the 2nd node of the 2nd layer using input (the first … output), weighted sum and activation function.)
	Chiueh/Means fails to explicitly teach the remaining limitations of the claim.  However, Ross teaches a non-transitory computer-readable medium storing computer-executable instructions that, when executed by a processor, cause the processor to perform operations comprising: (Ross col 9 (line 4-24) discloses a tangible non transitory program carrier (i.e. non-transitory computer-readable medium))
	systolically pulsing the first activation output from the first processing unit to a second processing unit. (Ross fig. 2 and col 3 (line 36-44), col 4 (line 34-50), col 5 (line 66) – col 6 (line 4) disclose that the output (i.e. first activation output) from one layer (i.e. first processing unit) can be provided in intervals using the clock signal (i.e. systolically pulsed) as input to another layer (i.e. a second processing unit)) 
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Chiueh’s One Dimensional Systolic Array Architecture for Neural Network, Means, and Ross’s Neural Network Processor, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would provide the output from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ross col 4(line 34-50)).  
	Neither Chiueh nor Ross appears to disclose explicitly the further limitations of the claim.  However, Bittner discloses systolically pulsing the first activation output … directly to a second processing unit…. (Bittner paragraph 68 and Fig. 2 disclose a system with a plurality of neural processor cores connected to other cores by an interconnect; Fig. 1 and paragraph 44 disclose that each core may contain an additional function element such as an activation function to produce a function f(R + B), where R is a result vector mantissa and B is a bias, and the result is passed through an output mantissa shifter that produces a final result vector [activation output] by aligning the elements of f(R + B) with an output exponent; see also Fig. 1, reference character 120 (showing a vector input 120 in each core; in other words, the vector result 155 of each core may be fed directly into the vector input 120 of another core connected by the interconnect))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Chiueh, Means, and Ross to pulse the activation output directly to a second processing unit, as disclosed by Bittner, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would increase efficiency by ensuring that the outputs do not need to be passed on to an intermediate module before being sent to the next element.  See Bittner, paragraph 44 and Fig. 1.

	Regarding claim 21, Chiueh teaches a device for performing computations of a neural network comprising at least first, second, and third layers, the device comprising: (Chiueh Fig. 1 and Col 1 (line 15-20) disclose a neural network with multiple layers (i.e. first, second, and third layers))
a first arrangement of first processing units, wherein at least a subset of the first processing units is assigned to perform computations of corresponding nodes of the first layer of the neural network output (Chiueh fig. 1, 3 and Col 1 (line 15-20), Col 3 (line 5-12) disclose a system comprising M processing elements to implement a multi-layer neural network. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weights.  Each processing element is known as a neuron. The 1st processing element / PE1, 2nd processing element/PE2, ……..Mth processing elements/PEM (i.e. a first arrangement of a first plurality of processing units including the first processing unit of first layer of the neural network) can generate activation outputs (i.e. compute them) for corresponding neurons (i.e. corresponding number of nodes) of the 1st layer using the input, weighted sum and activation function.)
a second arrangement of second processing units, wherein at least a subset of the second processing units is assigned to perform computations of corresponding nodes of the second layer of the neural network…. (Chiueh Fig. 1, 3 and Col 1 (line 15-20), Col 3 (line 5-12) disclose a system comprising M processing elements to implement a multi-layer neural network.. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weights.  Each processing element is known as a neuron. The 1st processing element / PE1, 2nd processing element/PE2, ……..Mth processing elements/PEM (i.e. a second arrangement of a second plurality of processing units including the second processing unit of the second layer of the neural network) can generate activation outputs (i.e. computation) for corresponding neurons (i.e. corresponding number of nodes) of the 2nd layer using input, weighted sum and activation function.)
	Chiueh fails to explicitly teach the remaining limitations of the claim.  However, Ross teaches a first systolic processing chip (Ross col 1 (line 15-16) and col 2 (line 51-64) discloses integrating components (i.e. a first arrangement of first processing units and a second arrangement of second processing units) of the neural network processor into one circuit as a hardware implementation (i.e. first systolic processing chip) to avoid off-chip communication.)  …:
	wherein the first arrangement of first processing units is configured to systolically pulse data … to the second arrangement of second processing units…. (Ross Fig 2 and Col 3 (line 36-44), Col 4 (line 34-50) disclose that the output (i.e. first activation output) from one layer (i.e. the first arrangement of first processing units) can be provided in intervals using the clock signal (i.e. systolically pulsed) as input to another layer (i.e. the second arrangement of second processing units))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Chiueh’s One Dimensional Systolic Array Architecture for Neural Network and Ross’s Neural Network Processor, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would provide the output efficiently (hardware implementation increase speed and throughput and reduce power and cost, over implementations in software) from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ross col 2 (line 51-56), col 4(line 34-50)). 
	Neither Chiueh nor Ross appears to disclose explicitly the further limitations of the claim.  However, Bittner discloses systolically puls[ing] data directly to [a] second arrangement of … processing units. (Bittner paragraph 68 and Fig. 2 disclose a system with a plurality of neural processor cores connected to other cores by an interconnect; Fig. 1 and paragraph 44 disclose that each core may contain an additional function element such as an activation function to produce a function f(R + B), where R is a result vector mantissa and B is a bias, and the result is passed through an output mantissa shifter that produces a final result vector [activation output] by aligning the elements of f(R + B) with an output exponent; see also Fig. 1, reference character 120 (showing a vector input 120 in each core; in other words, the vector result 155 of each core may be fed directly into the vector input 120 of another core connected by the interconnect))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Chiueh and Ross to pulse data directly to a second processing unit, as disclosed by Bittner, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would increase efficiency by ensuring that the outputs do not need to be passed on to an intermediate module before being sent to the next element.  See Bittner, paragraph 44 and Fig. 1.
Means discloses that “a processing unit of the at least a subset of the first processing units is configured … systolically [to] pulse data from a first output systolic element of the processing unit directly to a second systolic output element of another processing unit of the at least a subset of the first processing units (each processing element [processing unit] performs, inter alia, the accumulation of the 16-bit result of a multiplier in an accumulator and the placement of the result of the accumulator on a 28-pin output bus – Means, col. 2, l. 57-col. 3, l. 4; see also Fig. 1 [showing that the output 19 of accumulator 17 is connected to the output of the next processing element via a bus; since each PE’s accumulator plus the bus can be regarded as an “output systolic element,” the system pulses the output of one PE directly to the output of another PE]).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Chiueh, Bittner, and Ross to pulse data directly from the output of one processing unit to the output of another processing unit, as disclosed by Means, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the other processing unit to become aware of the output of the first processing unit for easy accumulation of the outputs.  See Means, col. 2, l. 57-col. 3, l. 4 and Fig. 1.

Claims 2-3 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Chiueh, in view of Ross, Means, and Bittner as shown above, and further in view of “Nonlinear Systems Identification Using Deep Dynamic Neural Networks” by Ogunmolu et al. (hereinafter, “Ogunmolu”).
Regarding claim 2, the combination of Chiueh, Bittner, Means, and Ross as shown above teaches the device of Claim 1.
Ross further teaches that the first output systolic element is further configured systolically … [to] pulse the first activation output to the second input systolic element during a first systolic pulse, (Ross fig. 2 and col 3(line 36-44), col 4(line34-50), col 5 (line 66) – col6(line 4) disclose that the output (i.e. first activation output) from one layer (i.e. first output systolic element) can be provided (i.e. systolically pulsed) as input to another layer (i.e. the second input systolic element) in intervals using the clock signal. (i.e. first systolic pulse))
and … the first output systolic element is further configured to systolically pulse the first activation output…… during the first systolic pulse (Ross fig. 2 and col 3(line 36-44), col 4(line34-50), col 5 (line 66) – col6 (line 4) disclose that the output (i.e. first activation output) from one layer (i.e. first output systolic element) can be provided (i.e. systolically pulse) as input to another layer in intervals using the clock signal. (i.e. first systolic pulse))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Chiueh’s One Dimensional Systolic Array Architecture for Neural Network, Bittner, Means, and Ross’s Neural Network Processor, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would provide the output from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ross col 4 (line 34-50)).  
	The combination of Chiueh, Bittner, and Ross fails to explicitly teach the remaining limitations of the claim.  However, Ogunmolu teaches …pulsing the first activation output to the second output systolic element (Ogunmolu Fig 2 discloses transferring (i.e. pulsing) output (i.e. first activation output) from one node (i.e. first output systolic element) to another node (i.e. second output systolic element) of the same layer. “The first output systolic element” (part of the 1st node’s processing unit of 1st layer) and “the second output systolic element” (part of the 3rd node’s processing unit of 1st layer) are part two different nodes of the same layer. Fig 2 indicates inter-transferring outputs from the each of the nodes to another node of the same layer.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ogunmolu’s Nonlinear Systems and Chiueh’s One Dimensional Systolic Array Architecture for Neural Network as modified by Ross’s Neural Network Processor and Bittner, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would create flexible connections between hidden layer nodes, thereby allowing the nodes to send output to other nodes asynchronously within the same layer (along with different layer) without unnecessary delay. (Ogunmolu: page 3, col 2 para. 2)   

Regarding claim 3, the combination of Chiueh, Ross, Bittner, Means, and Ogunmolu as shown above teaches the device of Claim 2.
Ross further teaches that the second output systolic element is further configured to systolically pulse a second activation output … during the first systolic pulse (Ross fig. 2 and col 3 (line 36-44), col 4 (line34-50), col 5 (line 66) – col 6 (line 4) disclose that the output (i.e. second activation output) from one layer (i.e. the second output systolic element) can be provided (i.e. systolically pulse) as input to another layer of the neural in intervals using the clock signal. (i.e. first systolic pulse))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Chiueh’s One Dimensional Systolic Array Architecture for Neural Network, Bittner, Means, and Ross’s Neural Network Processor, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would provide the output from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ross col 4 (lines 34-50)).  
Ogunmolu further teaches that the second output systolic element is further configured to [transfer] a second activation output to the first output systolic element (Ogunmolu Fig 2. discloses transferring output (i.e. second activation output) from one node (i.e. second output systolic element) to another node (i.e. first output systolic element) of the same layer. “The first output systolic element” (part of the 1st node’s processing unit of 1st layer) and “the second output systolic element” (part of the 3rd node’s processing unit of 1st layer) are two different nodes of the same layer. Fig 2 indicates inter-transferring outputs from the each of the nodes to another node of the same layer.)  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ogunmolu’s Nonlinear Systems and Chiueh’s One Dimensional Systolic Array Architecture for Neural Network as modified by Ross’s Neural Network Processor, Means, and Bittner, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would create flexible connections between hidden layer nodes, thereby nodes can send output to other nodes asynchronously within the same layer (along with different layer) without unnecessary delay. (Ogunmolu: page 3, col 2 para. 2)

Regarding claim 12, the combination of Chiueh, Bittner, Means, and Ross as shown above teaches the device of Claim 10.
	Chiueh further teaches an input systolic element of a second additional DPU assigned to perform computations of the second layer. (Chiueh Fig. 1, 3 and Col 1 (line 15-25), Col 3 (line 5-12) disclose a system comprising M processing elements to implement a multi-layer neural network. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weights.  Each processing element is known as a neuron. So the 2nd processing element (i.e. a second additional DPU of the second layer of the neural network) can generate an activation output (i.e. computations of the second layer) for the 2nd node of the 2nd layer using on input, weighted sum and activation function.)
Ross further teaches systolically pulsing the first activation output (Ross fig. 2 and col 3 (line 36-44), col 4 (line 34-50), col 5 (line 66) – col 6 (line 4) discloses that a neural network with multiple layers can be connected to compute interferences. The output (i.e. first activation output) from one node is transferred as input to a next layer. This transfer can happen in intervals using the clock signals. (i.e. be systolically pulsed)) from the … output systolic element of the … DPU over a crossover connection to an input systolic element of … [a] DPU (Ross fig. 2 and col 3 (line 36-44), col 4 (line 34-50), col 5 (line 66) – col 6(line 4) disclose neural network with multiple layers can be connected to compute interferences. Output from one layer (i.e. output systolic element of the DPU) is transferred as input (i.e. crossover connection) to the next layer (i.e. input systolic element of the DPU). This transfer can happen in intervals using the clock signals. (i.e. systolically pulsing)).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Chiueh’s One Dimensional Systolic Array Architecture for Neural Network, Bittner, and Ross’s Neural Network Processor, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would provide the output from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ross col 4 (line 34-50)).  
	The combination of Chiueh, Bittner, and Ross fails to explicitly teach the remaining limitations of the claim.  However, Ogunmolu teaches … pulsing the first activation output from [a] second output systolic element of [a] third DPU (Ogunmolu Fig 2 discloses transferring (i.e. pulsing) output (i.e. first activation output) from one node to another node (i.e. an output systolic element) of the same layer.  “The first activation output” was generated from the output [second output systolic element] of an element of a summation node [third DPU] and that output is transferred to another node of the same layer. A similar configuration can be found in Ogunmolu fig. 2, which indicates inter-transferring outputs from the each of the nodes to another node of the same layer.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ogunmolu’s Nonlinear Systems and Chiueh’s One Dimensional Systolic Array Architecture for Neural Network as modified by Ross’s Neural Network Processor, Means, and Bittner, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would create flexible connections between hidden layer nodes, thereby nodes can send output to other nodes asynchronously within the same layer (along with different layer) without unnecessary delay. (Ogunmolu: page 3, col 2 para. 2)

Claims 7-8 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Chiueh, in view of Ross, Means, and Bittner as shown above, further in view of  US 20150112911 A1  by Jackson et al. (hereinafter, Jackson).
Regarding claim 7, the combination of Chiueh, Bittner, Means, and Ross as shown above teaches the device of Claim 1.
	The combination of Chiueh, Bittner, Means, and Ross fails to explicitly teach the remaining limitations of the claim.  However, Jackson teaches that the first output systolic element is further configured to tag the first activation output with an identifier, wherein the identifier identifies that the first activation output was computed by the first processing unit. (Jackson fig. 9 and para [0049] disclose a serialize/de-serialize unit  (i.e. first output systolic element) of a funnel device configured to tag (i.e. tag with an identifier) each outgoing data packet (i.e. the first activation output) from the funnel device with tag information identifying the location (i.e. the identifier identifies that the first activation output was computed by the first processing unit) of a source core circuit that generated the outgoing packet)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Jackson’s Coupling Parallel Event-Driven Computation and Chiueh’s One Dimensional Systolic Array Architecture for Neural Network as modified by Ross’s Neural Network Processor, Means, and Bittner, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would ensure efficiency of the system by keeping track of the source of each output (Para. [0050]).  

Regarding claim 8, the combination of Chiueh, Ross, Means, Bittner, and Jackson as shown above teaches the device of Claim 7.
Ross further teaches that the first activation output [is] systolically pulsed to the second input systolic element (Ross fig.2 and col 3 (line 36-44), col 4 (lines 34-50), col 5 (line 65) – col 6 (line 4) disclose that the output (i.e. first activation output including tag) from one layer (i.e. first output systolic element) can be provided as input to another layer (i.e. the second input systolic element) in intervals using the clock signal. (i.e. systolically pulsed))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Chiueh’s One Dimensional Systolic Array Architecture for Neural Network, Bittner, Means, and Ross’s Neural Network Processor, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would provide the output from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ross col 4(line 34-50)).  
Jackson further teaches that the first activation output … pulsed … includes the tag (Jackson Para [0050, 0063] discloses tagging location information to an outgoing data packet (i.e. the first activation output includes tag ) and then deliver it to serial processing unit. A corresponding funnel device uses that tagged information for further processing that data packet.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Jackson’s Coupling Parallel Event-Driven Computation and Chiueh’s One Dimensional Systolic Array Architecture for Neural Network as modified by Ross’s Neural Network Processor, Means, and Bittner, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would ensure efficiency of the system by keeping track of the source of each output (Para. [0050]).   

Regarding claim 19, the combination of Chiueh, Bittner, Means, and Ross as shown above teaches the method of Claim 18.
	The combination of Chiueh, Bittner, Means, and Ross fails to explicitly teach the remaining limitations of the claim.  However, Jackson teaches that the operations further compris[e], by the first processing unit, tagging the first activation output with an origin address identifying its origin as the first processing unit. (Jackson fig. 9 and para. [0063] disclose that a neurosynaptic processing unit (i.e. the first processing unit)) tags each outgoing data packet (i.e. the first activation output) with address event representation information identifying a location (i.e. an origin address identifying its origin as the first processing unit) of a core circuit of the neurosynaptic processing unit that generated said outgoing data packet)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Jackson’s Coupling Parallel Event-Driven Computation and Chiueh’s One Dimensional Systolic Array Architecture for Neural Network as modified by Ross’s Neural Network Processor, Means, and Bittner, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would ensure efficiency of the system by keeping track of the source of each output (Para. [0050]). 

Claims 9 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Chiueh in view of Ross, Bittner, Means, and Jackson as shown above, and further in view of  US 20180189648 A1 by Sengupta et al. (hereinafter, Sengupta).
Regarding claim 9, the combination of Chiueh, Ross, Bittner, Means, and Jackson as shown above teaches the device of Claim 8.
Ross teaches that the second processing unit includes second processing circuitry [are] configured to receive the first activation output and perform processing according to the second node to generate a second activation output. (Ross Fig 1 (106, 108), Fig 5 (502) + col 1 (line 31-46), col 7 (line 17 -27) disclose that the matrix computation unit receives a weight input and activation inputs (i.e. the first activation output) from a unified buffer and generates (i.e. performs processing according to the second node) an accumulated value. A vector computation unit receives accumulated values and applies an activation function to generate (i.e. perform processing according to the second node) activation values (i.e. second activation output). Here, the matrix computation unit and vector computation unit (i.e. second processing circuitry) together process the inputs to generate an activation output)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Chiueh’s One Dimensional Systolic Array Architecture for Neural Network, Bittner, Means, and Ross’s Neural Network Processor, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would provide the output from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ross col 4(line 34-50)).  
	The combination of Chiueh, Ross, Bittner, Means, and Jackson fails to explicitly teach the remaining limitations of the claim.  However, Sengupta teaches that the second processing unit uses the tag to identify a weight to use for processing the first activation output. (Sengupta fig 3 and para. [0042, 0089] disclose a spike (i.e. first activation output) from X1 to X5 (i.e. the second processing unit) that uses the synapse weight W15 (i.e. weight) from a synapse weight array. The synapse weight array can be indexed based on the address of the pre-synapse neural unit (i.e. the tag (which identifies that the first activation output was computed by the first processing unit)) and the address of the post-synapse neural unit.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Sengupta’s Event Driven And Time Hopping Neural Network into Chiueh’s One Dimensional Systolic Array Architecture for Neural Network as modified by Ross’s Neural Network Processor, Bittner, Means, and Jackson’s Coupling Parallel Event-Driven Computation, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would identify an appropriate weight from a plurality of weights (based on the address of the node), thereby ensuring that the weight indicates the appropriate strength of the relationship between the nodes and increasing the efficiency of training. (Para. [0042, 0089])

Regarding claim 20, the combination of Chiueh, Ross, Bittner, Means and Jackson as shown above teaches the non-transitory computer-readable medium of Claim 19.
The combination of Chiueh, Ross, Bittner, Means, and Jackson fails to explicitly teach the remaining limitations of the claim.  However, Sengupta teaches that the operations further compris[e], by the second processing unit, identifying a weight with which to multiply the first activation output based on the origin address. (Sengupta fig. 3 and paras. [0042, 0089] disclose that a spike (i.e. first activation output) from X1 to X5 (i.e. the second processing unit) uses the synapse weight W15 (i.e. weight with which to multiply) from a synapse weight array. The synapse weight array can be indexed based on the address of the pre-synapse neural unit (i.e. the origin address (which identifies that the first activation output was computed by the first processing unit)) and the address of the post-synapse neural unit.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Sengupta’s Event Driven And Time Hopping Neural Network into Chiueh’s One Dimensional Systolic Array Architecture for Neural Network as modified by Ross’s Neural Network Processor, Bittner, Means, and Jackson’s Coupling Parallel Event-Driven Computation, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would identify an appropriate weight from a plurality of weights (based on the address of the node), thereby ensuring that the weight indicates the appropriate strength of the relationship between the nodes and increasing the efficiency of training. (Para. [0042, 0089])

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Chiueh, in view of Ross, Means, and Bittner as shown above, and further in view of Sengupta.
Regarding claim 17, the combination of Chiueh, Bittner, Means, and Ross as shown above teaches the method of Claim 16.
The combination of Chiueh, Bittner, Means, and Ross fails to explicitly teach the remaining limitations of the claim.  However, Sengupta teaches identifying the weight from among a plurality of weights based on information indicative of an origin address of the first activation output (Sengupta fig. 3 and para. [0042, 0089] disclose that a spike (i.e. first activation output) from X1 to X5 uses the synapse weight W15 (i.e. identifying the weight) from a synapse weight array (i.e. a plurality of weights). The synapse weight array can be indexed based on the address of the pre-synapse neural unit (i.e. information indicative of an origin address of the first activation output (which identifies that the first activation output was computed by the first processing unit)) and the address of the post-synapse neural unit.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Sengupta’s Event Driven and Time Hopping Neural Network into Chiueh’s One Dimensional Systolic Array Architecture for Neural Network as modified by Ross’s Neural Network Processor, Bittner, Means, and Jackson’s Coupling Parallel Event-Driven Computation, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would identify an appropriate weight from a plurality of weights (based on the address of the node), thereby ensuring that the weight indicates the appropriate strength of the relationship between the nodes and increasing the efficiency of training. (Para. [0042, 0089])

Response to Arguments
Applicant’s arguments with respect to the art rejection of the claims have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.  In particular, Applicant’s argument that the previously cited art does not disclose systolically pulsing a first activation output from an output systolic element of a processing unit directly to a second output systolic element of another processing unit in the same layer, Remarks at 14-16, is rendered moot by the addition of Means to the rejection.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RYAN C VAUGHN whose telephone number is (571)272-4849.  The examiner can normally be reached on M-R 7:50a-5:50p ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar, can be reached at 571-272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/R.C.V./Examiner, Art Unit 2125

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125