Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-21 are presented for examination.
In response to the amendments and remarks filed on 03/12/2021, objection to the specification and objection to the claim made in the previous office action has been withdrawn.
In response to the amendments and remarks filed on 03/12/2021, 35 USC 112(b) rejection to claims 4 and 13-17 made in the previous office action has been withdrawn.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/29/2021, 03/31/2021 and 04/30/2021 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.


Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:


The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are:


claim 1;
''third processing unit" and ''first output systolic element" in claim 2;
"second output systolic element" in claim 3;
third processing unit." and "second output: systolic element." (should be read as "first
output systolic element'' see under 35 USC§ 112(b) rejection for details) in claim 4;
"subset of the first. plurality of processing units" and "subset of the second plurality of
 	processing units" in claim 5;
"the second processing unit" in claim 9;
"first data processing unit (DPU)", "second DPU" in claim 10;
"first additional DPU" and "second additional DPU" in claim 12;
''first processing unit” and "second processing unit" in claim 18;
"second processing unit" in claim 20;
"the first arrangement of first processing units", "subset of the first processing units''
and "subset of the seecond processing unit'' in claim 21

Because this/these claim limitation(s) is/are being interpreted under 3.5 U.S,C 112{f) or
pre--AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the
corresponding structure described in the specification as performing the claimed function, and
equivalents thereof.
Regarding "first processing unit" and "second processing unit' in claim 1; "third
processing unit" in claim 2; "third processing unit" in claim 4; "subset of the first plurality of

processing unit{DPU)" and "second DPU" in claim 10; "first additional DPU'' and "second
additional DPLJ·'·' in claim 12.: "first processing unit" and "second processing unit" in claim 18;
"subset of the first processing units" and "subset of the second processing units" in claim 21.,
examiner is interpreting as hardware processor (para. 0045) programmed to perform
computation of a corresponding node of a corresponding layer by receiving inputs, using weight
summation function and calculating activation function (linear or nonlinear), as described in Fig.
2 and para. [00196].

Regarding “first output systolic element” in claim 1; “second output systolic
element” in claim 2; “second output systolic element” in claim 3; “first output systolic element” in claim 4; “the first arrangement of first processing units “in claim 21, examiner is interpreting systolic pulses from processing unit (input and output systolic element are included in the processing unit) as hardware processor (para. 0045) programmed to transfer data packets through nodes/layers in intervals, as described in para. [0009, 0205] and fig 4A – 4C.

If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim  9 and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

	“The second processing unit” in claim 9 and “second processing unit” in claim 20 invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed functions and to clearly link the structure, material, or acts to the functions.


Regarding “second processing unit” in claim 20, para. [0057] indicates that “identifying
the weight from among a plurality of weights based on information indicative of an origin address of the first activation output”. However the specification fails to indicate any algorithm by which the identification is carried out.	
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4-6, 10-11, 13-16, 18 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over US 5799134 A by Chiueh et al, (hereinafter, “Ref. Chiueh”), in view of US 9710748 B2 by Ross et al., (hereinafter, “Ref. Ross”).

As per claim 1, Ref. Chiueh teaches a device for performing computations of a neural
network comprising at least a first layer and a second layer, the device comprising: (Ref. Chiueh Fig. 1 and Col 1 (line 15-20) disclose neural network with multiple layers (i.e. a first layer and a second layer))
	a first processing unit configured to perform computations of a first node of the first layer of the neural network, the first processing unit including: (Ref. Chiueh Fig. 1, 3 and Col 1 (line 15-25), Col 3 (line 5-12) disclose system comprising M processing elements to implement a multi-layer neural network. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weight. Each processing element is known as Neurons. So 1st processing element (i.e. a first processing unit of the first layer of the 1st layer of the neural network) can generate activation output (i.e. perform computation) of the 1st node of the 1st layer using input, weighted sum and activation function.)
a first input systolic element (Ref. Chiueh Fig. 3 and Col 3 (line 5-10) disclose Each processing element also includes a processor (i.e. first input systolic element)  for receiving a sequence of inputs for outputting an accumulated value)

Ref. Chiueh Fig. 3 and Col 4 (line 40-45) disclose each processing element PE-i contains a storage element 106-i. Each storage element 106-i (i.e.  first output systolic element) stores a value g.sub.i (i.e. first … output) received from the corresponding processor 104-i. (i.e. the first processing circuitry)) 
	a second processing unit configured to perform computations of a second node of the second layer of the neural network, wherein the second processing unit includes a second input systolic element, (Ref. Chiueh Fig. 1,3 and Col 1 (line 15-25), Col 3 (line 5-12) disclose system comprising M processing elements to implement a multi-layer neural network. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weight. Each processing element is known as Neurons. So 2nd processing element (i.e. a second processing unit of the second layer of the neural network) can generate output (i.e. perform computation) of the 2nd node of the 2nd layer using input, See also Fig. 3 and Col 3 (line 5-10) disclose each processing element also includes a processor (i.e. first input systolic element) for receiving a sequence of inputs for outputting (i.e. perform computation) an accumulated value) 
	Ref. Chiueh fails to explicitly teach: 
first processing circuitry configured to receive data from the first input systolic element and perform processing according to the first node to generate a first activation output;
	wherein the first output systolic element is further configured to systolically pulse the first activation output to the second input systolic element.
	However Ref. Ross teaches:
	first processing circuitry configured to receive data from the first input systolic element and perform processing according to the first node to generate a first activation output (Ref. Ross fig. 1(106,108), fig 2(212,214) and col 1 (line 31-46), Col 7(line 17-27) disclose matrix computation unit receives weight input and activation inputs (i.e. data) from unified buffer (i.e. first input systolic element) and generate (i.e. perform processing according to the first node) accumulated value. Vector computation unit receives accumulated values and apply activation function to generate (i.e. perform processing according to the first node) activation values (i.e. first activation output). Here, matrix computation unit is configured to receive inputs from unified buffer (i.e. first input systolic element) and then matrix computation unit and vector computation unit (i.e. first processing circuitry) together process the inputs to generate activation output for the first node.)  
	wherein the first output systolic element is further configured to systolically pulse the first activation output to the second input systolic element (Ref. Ross fig. 2 and Col 3(line 36-44), col 4(line34-50), col 5 (line 65) – col 6(line 4) disclose output (i.e. first activation output) from one layer (i.e. first output systolic element) can be provided as input to another layer (i.e. the second input systolic element) in intervals using the clock signal. (i.e. systolically pulse))
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Chiueh’s One Dimensional Systolic Array Architecture For Neural Network into Ref. Ross’s Neural Network Processor, with a motivation to provide the output from one layer to another layer at 

As per claim 4, the combination of Ref. Chiueh and Ref. Ross as shown above teaches the device of Claim 1. 
Ref. Chiueh teaches further comprising a third processing unit configured to perform computations of a third node of the second layer of the neural network (Ref. Chiueh Fig. 1,3 and Col 1 (line 15-25), Col 3 (line 5-12) disclose system comprising M processing elements to implement a multi-layer neural network. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weight.  Each processing element is known as Neurons. So 3rd processing element (i.e. a third processing unit of the second layer of the neural network) can generate activation output (i.e. perform computation) of the 3rd node of the 2nd layer using input, weighted sum and activation function.)
	the third processing unit including a third input systolic element (Ref. Chiueh Fig. 3 and Col 3 (line 5-10) disclose each processing element also includes a processor (i.e. third input systolic element)  for receiving a sequence of inputs for outputting an accumulated value).
	Ref. Ross teaches wherein the first output systolic element is configured to systolically pulse the first activation output to the third input systolic element (Ref. Ross fig. 2 and Col 3(line 36-44), col 4(line34-50), col 5 (line 65) – col 6(line 4) disclose output (i.e. first activation output) from one layer (i.e. first output systolic element) can be transferred as input to another layer (i.e. the third input systolic element) in intervals using the clock signal. (i.e. systolically pulse)).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Chiueh’s One Dimensional Systolic Array Architecture For Neural Network into Ref. Ross’s Neural Network Processor, with a motivation to provide the output from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ref. Ross col 4(line 34-50)).  	

As per claim 5, the combination of Ref. Chiueh and Ref. Ross as shown above teaches the device of Claim 1.
	Ref. Chiueh teaches, further comprising: a first arrangement of a first plurality of processing units including the first processing unit, wherein at least a subset of the first plurality of processing units is configured to perform computations of a corresponding number of nodes of the first layer of the neural network (Ref. Chiueh fig. 1, 3 and col 1 (line 15-25), col 3 (line 5-12) disclose system comprising M processing elements to implement a multi-layer neural network. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weight. Each processing element is known as Neurons. 1st processing element / PE1, 2nd processing element/PE2, ……..Mth processing elements/PEM (i.e. a first arrangement of a first plurality of processing units including the first processing unit of first layer of the neural network) can generate activation outputs (i.e. perform computation) for corresponding neurons (i.e. corresponding number of nodes) of the 1st layer using input, weighted sum and activation function.)
	a second arrangement of a second plurality of processing units including the second processing unit, wherein at least a subset of the second plurality of processing units is configured to perform computations of a corresponding number of nodes of the second layer of the neural network (Ref. Chiueh fig. 1, 3 and col 1 (line 15-25), col 3 (line 5-12) disclose a system comprising M processing elements to implement a multi-layer neural network. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weight.  Each processing element is known as Neurons. 1st processing element / PE1, 2nd processing element/PE2, ……..Mth processing elements/PEM (i.e. a second arrangement of a second plurality of processing units including the second processing unit of the second layer of the neural network) can generate activation outputs (i.e. perform computation) for corresponding neurons (i.e. corresponding number of nodes) of the 2nd layer using input, weighted sum and activation function.)
	Ref. Ross teaches and a crossover connection between an output systolic element of one of the first plurality of processing units and an input systolic element of one of the second plurality processing units (Ref. Ross col 3(line 36-44) discloses neural network with multiple layers can be connected to compute interferences. Output from one layer (i.e. output systolic element of one of the first plurality of processing units) being provided as input (i.e. crossover connection) to next layer (i.e. input systolic element of one of the second plurality processing units))
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Chiueh’s One Dimensional Systolic Array Architecture For Neural Network into Ref. Ross’s Neural Network Processor, with a motivation to provide the output from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ref. Ross col 4(line 34-50)).    

	As per claim 6, the combination of Ref. Chiueh and Ref. Ross as shown above teaches the device of Claim 1.
	Ref. Ross teaches, wherein the device further includes a systolic processor chip, and wherein the first and second processing units comprise circuitry embedded in the systolic processor chip (Ref. Ross col 1(line 15-16), col 2(line 51-64) disclose integrating components (i.e. wherein the first and second processing units comprise circuitry) of the neural network processor into one circuit as hardware implementation (i.e. the systolic processor chip) to avoid off-chip communication.
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Chiueh’s One Dimensional Systolic Array Architecture for Neural Network into Ref. Ross’s Neural Network Processor, with a motivation to improve efficiency (e.g., increase speed and throughput and reduce power and cost, over implementations in software) of the neural network. (Ref. Ross col 2 (line 51-64)).
As per claim 10, Ref. Chiueh teaches a method for performing computations of a neural network comprising at least a first layer and a second layer, the method comprising: (Ref. Chiueh Fig. 1 and Col 1 (line 15-20) disclose neural network with multiple layers (i.e. a first layer and a second layer))
	assigning a first data processing unit (DPU) to perform computations of a first node of the first layer of the neural network (Ref. Chiueh Fig. 1,3 and Col 1 (line 15-25), Col 3 (line 5-12) disclose system comprising M processing elements to implement a multi-layer neural network. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weight.  Each processing element is known as Neurons. So 1st processing element (i.e. a first data processing unit (DPU) of the first layer of the 1st layer of the neural network) can generate output (i.e. perform computation) of the 1st node of the 1st layer.)  
	assigning a second DPU to perform computations of a second node of the second layer of the neural network (Ref. Chiueh Fig. 1, 3 and Col 1 (line 15-25), Col 3 (line 5-12) disclose system comprising M processing elements to implement a multi-layer neural network... The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weight.  Each processing element is known as Neurons. So 2nd processing element (i.e. a second DPU of the second layer of the neural network) can generate output (i.e. perform computation) of the 2nd node of the 2nd layer)
	transmitting the first … output to a first output systolic element of the first DPU (Ref. Chiueh Fig. 3 and Col 4 (line 40-45) disclose each processing element PE-i contains a storage element 106-i. Each storage element 106-i (i.e. first output systolic element) receives (i.e. transmitting) a value g.sub.i (i.e. first …output) from the corresponding processor 104-i.)

	Ref. Chiueh fails to explicitly teach:
	performing computations of the first node of the first layer using the first DPU to generate a first activation output;
	systolically pulsing the first activation output from the first output systolic element to a first input systolic element of the second DPU during a first systolic pulse;
	and performing computations of the second node of the second layer by using the second DPU to process at least the first activation output, wherein the method is performed by at least one processor.
	Ref. Ross teaches:
	performing computations of the first node of the first layer using [a] DPU to generate a first activation output (Ref. Ross Fig 1(106,108), fig. 2 and col 1 (line 31-46), Col 7 (line 17-27) disclose circuit (i.e. DPU) comprises matrix computation unit, a vector computation unit and unified buffer. Matrix computation unit receives weight input and activation inputs from unified buffer and generates (i.e. performing calculation) accumulated value and vector computation unit receives accumulated values and apply activation function to generate (i.e. calculation) activation values (i.e. first activation output)).
	systolically pulsing the first activation output from the first output systolic element to a first input systolic element of the …DPU during a first systolic pulse (Ref. Ross fig. 2 and col 3(line 36-44), col 4 (line34-50) disclose output (i.e. first activation output) from one layer (i.e. first output systolic element) can be transferred (i.e. systolically pulse) as input to another layer (i.e. second input systolic element of the DPU) in intervals using the clock signal (i.e. first systolic pulse)).  
	and performing computations of the second node ... by using the …DPU to process at least the first activation output, wherein the method is performed by at least one processor. (Ref. Ross Col 3(line36-44) discloses neural network with multiple layers can be connected to compute interferences. Output from one layer (i.e. first activation output) being provided as input to next layer. See also Ref. Ross Fig 1(106,108), fig. 2 and col 1 (line 31-46), col 7 (line 17-27) disclose circuit (i.e. DPU) comprises matrix computation unit, a vector computation unit and unified buffer. Matrix computation unit receives weight input and activation input (i.e. first activation output) from unified buffer and generates (i.e. performing calculation) accumulated value and vector computation unit receives accumulated values and apply activation function to generate (i.e. performing calculation) activation values. See also Ref. Ross col 2(line 51-64) discloses implementing a neural network processor (i.e. processor) in hardware improves efficiency, e.g., increase speed and throughput and reduce power and cost, over implementations in software)
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Chiueh’s One Dimensional Systolic Array Architecture For Neural Network into Ref. Ross’s Neural Network Processor, with a motivation to provide the output from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ref. Ross col 4(line 34-50)).  
As per claim 11, the combination of Ref. Chiueh and Ref. Ross as shown above teaches the method of Claim 10.
	Ref. Chiueh further teaches further comprising … the first … output through a plurality of input systolic elements of a corresponding plurality of DPUs assigned to perform computations of the … layer (Ref. Chiueh Fig. 1, 3 and Col 1 (line 15-25), Col 3 (line 5-12) disclose system comprising M processing elements (i.e. plurality of DPUs) to implement a multi-layer neural network. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weight, processor (i.e. plurality of input systolic elements) for receiving input (i.e. first … output) for outputting an accumulated value)
	Ref. Ross further teaches systolically pulsing the first activation output …to perform computations of the second layer (Ref. Ross fig. 2 and col 3(line 36-44), col 4(line34-50), col 5 (line 66) – col6 (line 4) disclose neural network with multiple layers can be connected to compute interferences. Output (i.e. the first activation output) from one layer being provided in intervals using the clock signal (i.e. systolically pulsing) as input to next layer.  See also (Ref. Ross fig. 1(106,108), fig 2(212,214) and col 1 (line 31-46) disclose matrix computation unit receives weight input and activation inputs of previous layer (i.e. first activation output) from unified buffer and generate (i.e. perform computations of the second layer) accumulated value. Vector computation unit receives accumulated values and apply activation function to generate (i.e. perform computations of the second layer) activation values).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Chiueh’s One 
	As per claim 13, the combination of Ref. Chiueh and Ref. Ross as shown above teaches the method of Claim 11.
	Ref. Ross further teaches:
	wherein the computations of the second node include a multiplication of the first activation output pulsed to the first input systolic element with a weight (Ref. Ross fig. 2(212) and col 3 (line 36-44), col 3(line66) – col 4(line9), col 4(line 36-50), col 5(line65) – col 6(line 4) disclose matrix multiplication unit can multiply (i.e. multiplication) weight input (i.e. weight) with activation input (i.e. the first activation output pulsed to the first input systolic element) and sum the products together to form an accumulated value
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Chiueh’s One Dimensional Systolic Array Architecture For Neural Network into Ref. Ross’s Neural Network Processor, with a motivation to provide the output from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ref. Ross col 4(line 34-50)).  

	As per claim 14, the combination of Ref. Chiueh and Ref. Ross as shown above teaches the method of Claim 13. 
Ref. Chiueh further teaches:
wherein the weight is stored locally at the second node (Ref. Chiueh fig 3 (102 -1) and 
Col 3 (line 5-10) disclose the one dimensional systolic array comprises M processing elements (PE's). The i.sup.th processing element, i=1, 2, . . . , M comprises a weight storage circuit for storing (i.e. weight is stored locally at the second node) a sequence of synaptic weight).
	As per claim 15, the combination of Ref. Chiueh and Ref. Ross as shown above teaches the method of Claim 13.
	Ref. Chiueh further teaches: 
wherein the weight is retrieved from a memory external to the second node (Ref. Chiueh Col 3 (Line 25-35) discloses weight storage circuits in the processing elements are connected to the microprocessor via a weight (W) bus and an address (A) bus. The sequences of synaptic weight w.sub.ij are transmitted via the W-bus (i.e. retrieved from a memory external to the second node) to the particular weight storage circuit indicated by the A-bus).
As per claim 16, the combination of Ref. Chiueh and Ref. Ross as shown above teaches the method of Claim 13.
Ref. Ross further teaches:
wherein the multiplication is performed by a feedback convolution engine, the method further comprising feeding the multiplied first activation output back into the feedback convolution engine during processing of another activation output (Ref. Ross Fig 3, 4 and col 3 (line 66) – col 4(line 9) disclose matrix multiplication unit (i.e. feedback convolution engine) can multiply (i.e. multiplication) weight input with activation input and sum the products together to form an accumulated value. See also, col 6 (line 5-25) discloses matrix multiplication unit includes accumulator units that store (i.e. feeding output back into the feedback convolution engine) accumulated output (i.e. the multiplied first activation output) from each column when performing calculations. The accumulator units can accumulate each accumulated output to generate a final accumulated value. The final accumulated value can be transferred to a vector computation unit. See also, fig. 3 and col 6(line 5-25) disclose while a cell can process an activation input and send the processed value to accumulator unit, another cell can process another activation input (i.e. processing of another activation output)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Chiueh’s One Dimensional Systolic Array Architecture for Neural Network into Ref. Ross’s Neural Network Processor, with a motivation to ensure efficiency of the system. (Ref. Ross col 6 (line5-25))

As per claim 18, Ref. Chiueh teaches 
performing, using a first processing unit, computations of a first node of a neural
network to generate a first … output, the first node included in a first layer of the neural network (Ref. Chiueh Fig. 1,3 and Col 1 (line 15-20), Col 3 (line 5-12) disclose system comprising M processing elements to implement a multi-layer neural network.. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weight.  Each processing element is known as Neurons. So 1st processing element (i.e. a first processing unit of the first node of the first layer of the neural network) can generate (i.e. computations of a first node) output (i.e. first … output) for the 1st node of the 1st layer.)
	a second processing unit assigned to perform computations of a second node of the neural network, the second node included in a second layer of the neural network ((Ref. Chiueh
Fig. 1, 3 and Col 1 (line 15-20), Col 3 (line 5-12) disclose system comprising M processing elements to implement a multi-layer neural network. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weight.  Each processing element is known as Neurons. So 2nd processing element (i.e. second processing unit of the second layer of the neural network) can generate activation output (i.e. computations of a second node of the neural network) for the 2nd node of the 2nd layer.)
and performing computations of the second node by using the second processing unit to process at least the first … output to generate a second … output (Ref. Chiueh
fig. 1, 3 and col 1 (line 15-20), col 3 (line 5-12) disclose system comprising M processing elements to implement a multi-layer neural network.. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weight.  Each processing element is known as Neurons. So 2nd processing element (i.e. second processing unit of the second layer of the neural network) can generate (i.e. computations of a second node of the neural network) output (i.e. second … output) for the 2nd node of the 2nd layer using input (the first … output), weighted sum and activation function.)
	Ref. Chiueh fails to explicitly teach:
a non-transitory computer-readable medium storing computer-executable instructions
that, when executed by a processor, cause the processor to perform operations comprising:

	However Ref. Ross teaches:
a non-transitory computer-readable medium storing computer-executable
instructions that, when executed by a processor, cause the processor to perform operations comprising: (Ref. Ross col 9 (line 4-24) discloses tangible non transitory program carrier (i.e. non-transitory computer-readable medium))

	systolically pulsing the first activation output from the first processing unit to a second processing unit (Ref. Ross fig. 2 and col 3(line 36-44), col 4(line34-50), col 5 (line 66) – col6 (line 4) disclose output (i.e. first activation output) from one layer (i.e. first processing unit) can be provided in intervals using the clock signal. (i.e. systolically pulsing) as input to another layer (i.e. a second processing unit)) 
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Chiueh’s One Dimensional Systolic Array Architecture For Neural Network into Ref. Ross’s Neural Network Processor, with a motivation to provide the output from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ref. Ross col 4(line 34-50)).  

	As per claim 21, Ref. Chiueh teaches a device for performing computations of a neural network comprising at least first, second, and third layers, the device comprising: (Ref. Chiueh Fig. 1 and Col 1 (line 15-20) disclose neural network with multiple layers (i.e. first, second, and third layers))
a first arrangement of first processing units, wherein at least a subset of the first processing units are assigned to perform computations of corresponding nodes of the first layer of the neural network output (Ref. Chiueh fig. 1, 3 and Col 1 (line 15-20), Col 3 (line 5-12) disclose system comprising M processing elements to implement a multi-layer neural network.. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weight.  Each processing element is known as Neurons. 1st processing element / PE1, 2nd processing element/PE2, ……..Mth processing elements/PEM (i.e. a first arrangement of a first plurality of processing units including the first processing unit of first layer of the neural network) can generate activation outputs (i.e. computation) for corresponding neurons (i.e. corresponding number of nodes) of the 1st layer using input, weighted sum and activation function.)
a second arrangement of second processing units, wherein at least a subset of the second processing units are assigned to perform computations of corresponding nodes of the second layer of the neural network (Ref. Chiueh Fig. 1, 3 and Col 1 (line 15-20), Col 3 (line 5-12) disclose Teaches system comprising M processing elements to implement a multi-layer neural network.. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weight.  Each processing element is known as Neurons. 1st processing element / PE1, 2nd processing element/PE2, ……..Mth processing elements/PEM (i.e. a second arrangement of a second plurality of processing units including the second processing unit of the second layer of the neural network) can generate activation outputs (i.e. computation) for corresponding neurons (i.e. corresponding number of nodes) of the 2nd layer using input, weighted sum and activation function.
	Ref. Chiueh fails to explicitly teach:
	a first systolic processing chip including at least:
	and wherein the first arrangement of first processing units is configured to systolically pulse data to the second arrangement of second processing units.
	However Ref. Ross teaches:
	a first systolic processing chip including at least: ( Ref. Ross col 1 (line 15-16) and col 2 (line 51-64) disclose integrating components (i.e. a first arrangement of first processing units and a second arrangement of second processing units) of the neural network processor into one circuit as hardware implementation (i.e. first systolic processing chip) to avoid off-chip communication.)  
	and wherein the first arrangement of first processing units is configured to systolically pulse data to the second arrangement of second processing units. (Ref. Ross Fig 2 and 
Col 3(line 36-44), Col 4 (line 34-50) disclose output (i.e. first activation output) from one layer (i.e. wherein the first arrangement of first processing units) can be provided in intervals using the clock signal (i.e. systolically pulse data) as input to another layer (i.e. data to the second arrangement of second processing units))

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Chiueh’s One Dimensional Systolic Array Architecture For Neural Network into Ref. Ross’s Neural Network  increase speed and throughput and reduce power and cost, over implementations in software) from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ref. Ross col 2 (line 51-56), col 4(line 34-50)). 

Claims 2-3 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Ref. Chiueh, in view of Ref. Ross as shown above, further in view of “Nonlinear Systems Identification Using Deep Dynamic Neural Networks” by Ogunmolu et al. (hereinafter, Ogunmolu).
As per claim 2, combination of Ref. Chiueh and Ref. Ross as shown above teaches
the device of Claim 1.
Ref. Chiueh further teaches:
further comprising a third processing unit configured to perform computations of a third
node of the first layer of the neural network, (Ref. Chiueh Fig. 1,3 and Col 1 (line 15-25), Col 3 (line 5-12) disclose system comprising M processing elements to implement a multi-layer neural network. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weight. Each processing element is known as Neurons. So 1st processing element (i.e. a first processing unit of the first layer of the 1st layer of the neural network) can generate activation output (i.e. perform computation) of the 1st node of the 1st layer using input, weighted sum and activation function.)
the third processing unit including a second output systolic element, (Ref. Chiueh Fig. 3 and Col 4 (line 40-45) disclose each processing element PE-i contains a storage element 106-i. Each storage element 106-i (i.e.  first output systolic element) stores a value g.sub.i (i.e. first activation output) received from the corresponding processor 104-i. (i.e. the first processing circuitry))
Ref. Ross further teaches:
wherein the first output systolic element systolically pulses the first activation output to the second input systolic element during a first systolic pulse, (Ref. Ross fig. 2 and col 3(line 36-44), col 4(line34-50), col 5 (line 66) – col6(line 4) disclose output (i.e. first activation output) from one layer (i.e. first output systolic element) can be provided (i.e. systolically pulse)as input to another layer (i.e. the second input systolic element) in intervals using the clock signal. (i.e. first systolic pulse))
and wherein the first output systolic element is further configured to systolically pulse the first activation output…… during the first systolic pulse (Ref. Ross fig. 2 and col 3(line 36-44), col 4(line34-50), col 5 (line 66) – col6 (line 4) disclose output (i.e. first activation output) from one layer (i.e. first output systolic element) can be provided (i.e. systolically pulse) as input to another layer in intervals using the clock signal. (i.e. first systolic pulse))
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Chiueh’s One Dimensional Systolic Array Architecture For Neural Network into Ref. Ross’s Neural Network Processor, with a motivation to provide the output from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ref. Ross col 4(line 34-50)).  
	Combination of Ref. Chiueh and Ref. Ross fails to explicitly teach:
	… pulsing the first activation output to the second output systolic element	
However Ogunmolu teaches:
	…pulsing the first activation output to the second output systolic element (Ogunmolu Fig 2. discloses transferring (i.e. pulsing) output (i.e. first activation output) from one node (i.e. first output systolic element) to another node (i.e. second output systolic element) of the same layer. “The first output systolic element” (part of the 1st node’s processing unit of 1st layer) and “the second output systolic element” (part of the 3rd node’s processing unit of 1st layer) are part two different nodes of the same layer. In fig 2. Indicates inter-transferring outputs from the each of the nodes to another node of the same layer.)
	Therefore, it would have been obvious to one of ordinary skill in the art before
the effective filing date of the claimed invention to combine the teachings of Ogunmolu’s Nonlinear Systems into Ref. Chiueh’s One Dimensional Systolic Array Architecture for Neural Network as modified by Ref. Ross’s Neural Network Processor, with a motivation to create flexible connections between hidden layer nodes, thereby allowing the nodes to send output to other nodes asynchronously within the same layer (along with different layer) without unnecessary delay. (Ogunmolu: page 3, col 2 para. 2)   

As per claim 3, combination of Ref. Chiueh, Ref. Ross and Ogunmolu as shown above teaches the device of Claim 2.
Ref. Ross further teaches wherein the second output systolic element is further configured to systolically pulse a second activation output … during the first systolic pulse (Ref. Ross fig. 2 and col 3(line 36-44), col 4(line34-50), col 5 (line 66) – col 6 (line 4) disclose output (i.e. second activation output) from one layer (i.e. the second output systolic element)  can be provided (i.e. systolically pulse) as input to another layer of the neural in intervals using the clock signal. (i.e. first systolic pulse)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Chiueh’s One Dimensional Systolic Array Architecture For Neural Network into Ref. Ross’s Neural Network Processor, With a motivation to provide the output from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ref. Ross col 4(line 34-50)).  

Ogunmolu further teaches:
wherein the second output systolic element is further configured to … a second activation output to the first output systolic element (Ogunmolu Fig 2. discloses transferring output (i.e. second activation output) from one node (i.e. second output systolic element) to another node (i.e. first output systolic element) of the same layer. “The first output systolic element” (part of the 1st node’s processing unit of 1st layer) and “the second output systolic element” (part of the 3rd node’s processing unit of 1st layer) are part two different nodes of the same layer. In fig 2. Indicates inter-transferring outputs from the each of the nodes to another node of the same layer.)  
Therefore, it would have been obvious to one of ordinary skill in the art before
the effective filing date of the claimed invention to combine the teachings of Ogunmolu’s Nonlinear Systems into Ref. Chiueh’s One Dimensional Systolic Array Architecture for Neural 
As per claim 12, combination of Ref. Chiueh and Ref. Ross as shown above teaches
the device of Claim 11.
	Ref. Chiueh further teaches:
	an output systolic element of a first additional DPU assigned to perform computations of the first layer (Ref. Chiueh Fig. 1,3 and Col 1 (line 15-25), Col 3 (line 5-12) disclose system comprising M processing elements to implement a multi-layer neural network.. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weight.  Each processing element is known as Neurons. So 1st processing element (i.e. a first additional DPU of the first layer of the neural network) can generate activation output (i.e. computations of the first layer) of the first layer using on input, weighted sum and activation function.)
	an input systolic element of a second additional DPU assigned to perform computations of the second layer. (Ref. Chiueh Fig. 1, 3 and Col 1 (line 15-25), Col 3 (line 5-12) disclose system comprising M processing elements to implement a multi-layer neural network. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weight.  Each processing element is known as Neurons. So 2nd processing element (i.e. a second additional DPU of the second layer of the neural network) can generate activation output (i.e. computations of the second layer) for the 2nd node of the 2nd layer using on input, weighted sum and activation function.
Ref. Ross further teaches:
systolically pulsing the first activation output (Ref. Ross fig. 2 and col 3(line 36-44), col 4(line34-50), col 5 (line 66) – col 6(line 4) disclose neural network with multiple layers can be connected to compute interferences. Output (i.e. first activation output) from one node being transferred as input to next layer. This transfer can happen in intervals using the clock signals. (i.e. systolically pulsing)).
	and systolically pulsing the first activation output from the output systolic element of the … DPU over a crossover connection to an input systolic element of … DPU (Ref. Ross fig. 2 and col 3(line 36-44), col 4(line34-50), col 5 (line 66) – col 6(line 4) disclose neural network with multiple layers can be connected to compute interferences. Output from one layer (i.e. output systolic element of the ...DPU) being transferred as input (i.e. crossover connection) to next layer (i.e. input systolic element of the DPU). This transfer can happen in intervals using the clock signals. (i.e. systolically pulsing)).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Chiueh’s One Dimensional Systolic Array Architecture For Neural Network into Ref. Ross’s Neural Network Processor, with a motivation to provide the output from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ref. Ross col 4(line 34-50)).  
	Combination of Ref. Chiueh and Ref. Ross fails to explicitly teach:
	…pulsing the first activation output to an output systolic element of a first additional DPU
	However Ogunmolu teaches:
	… pulsing the first activation output to an output systolic element of a first additional DPU (Ogunmolu Fig 2. discloses transferring (i.e. pulsing) output (i.e. first activation output) from one node to another node (i.e. an output systolic element) of the same layer. 
“The first activation output” was generated in the 1st node of the 1st layer and transferring that output to another node of the same layer. Similar configuration can be in Ogunmolu fig. 2 indicating inter-transferring outputs from the each of the nodes to another node of the same layer.)
Therefore, it would have been obvious to one of ordinary skill in the art before
the effective filing date of the claimed invention to combine the teachings of Ogunmolu’s Nonlinear Systems into Ref. Chiueh’s One Dimensional Systolic Array Architecture for Neural Network as modified by Ref. Ross’s Neural Network Processor, with a motivation to create flexible connections between hidden layer nodes, thereby nodes can send output to other nodes asynchronously within the same layer (along with different layer) without unnecessary delay. (Ogunmolu: page 3, col 2 para. 2)

Claims 7-8 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Ref. Chiueh, in view of Ref. Ross as shown above, further in view of  US 20150112911 A1  by Jackson et al. (hereinafter, Ref. Jackson).
As per claim 7, combination of Ref. Chiueh and Ref. Ross as shown above teaches

	Combination of Ref. Chiueh and Ref. Ross fails to explicitly teach:
	wherein the first output systolic element is further configured to tag the first activation output with an identifier, wherein the identifier identifies that the first activation output was computed by the first processing unit.
However Ref. Jackson teaches:
wherein the first output systolic element is further configured to tag the first activation output with an identifier, wherein the identifier identifies that the first activation output was computed by the first processing unit (Ref. Jackson fig.9 and para [0049] disclose serialize/de-serialize unit  (i.e. first output systolic element) of funnel device configured to tag (I.e. tag the with an identifier ) each outgoing data packet (i.e. the first activation output) from the funnel device with tag information identifying the location (i.e. identifier identifies that the first activation output was computed by the first processing unit) of a source core circuit generated the outgoing packet)
	Therefore, it would have been obvious to one of ordinary skill in the art before
the effective filing date of the claimed invention to combine the teachings of Jackson’s Coupling Parallel Event-Driven Computation into Ref. Chiueh’s One Dimensional Systolic Array Architecture For Neural Network as modified by Ref. Ross’s Neural Network Processor, with a motivation to ensure efficiency of the system by keeping track of the source of each output (Para. [0050]).  
As per claim 8, combination of Ref. Chiueh, Ref. Ross and Ref. Jackson as shown above teaches the device of Claim 7.
Ref. Ross further teaches:
wherein the first activation output systolically pulsed to the second input systolic element (Ref. Ross fig.2 and col 3(line 36-44), col 4(line34-50), col 5 (line 65) – col 6 (line 4) disclose output (i.e. first activation output including tag) from one layer (i.e. first output systolic element) can be provided as input to another layer (i.e. the second input systolic element) in intervals using the clock signal. (i.e. systolically pulsed))
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Chiueh’s One Dimensional Systolic Array Architecture For Neural Network into Ref. Ross’s Neural Network Processor, with a motivation to provide the output from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ref. Ross col 4(line 34-50)).  
Ref. Jackson further teaches:
first activation output … pulsed … includes the tag (Ref. Jackson Para [0050, 0063] disclose tagging location information to outgoing data packet (i.e. the first activation output includes tag ) and then deliver to serial processing unit. A corresponding funnel device uses that tagged information for further processing that data packet.)
	Therefore, it would have been obvious to one of ordinary skill in the art before
the effective filing date of the claimed invention to combine the teachings of Jackson’s Coupling Parallel Event-Driven Computation into Ref. Chiueh’s One Dimensional Systolic Array Architecture For Neural Network as modified by Ref. Ross’s Neural Network Processor, with a 
As per claim 19, combination of Ref. Chiueh and Ref. Ross as shown above teaches
the method of Claim 18.
	Combination of Ref. Chiueh and Ref. Ross fails to explicitly teach:
	the operations further comprising, by the first processing unit, tagging the first activation output with an origin address identifying its origin as the first processing unit.
	However Ref. Jackson teaches:
	the operations further comprising, by the first processing unit, tagging the first activation output with an origin address identifying its origin as the first processing unit. (Ref. Jackson fig. 9 and para. [0063] disclose neurosynaptic processing unit (i.e. by the first processing unit)) tag each outgoing data packet (i.e. the first activation output) with address event representation information identifying a location (i.e. an origin address identifying its origin as the first processing unit) of a core circuit of the neurosynaptic processing unit that generated said outgoing data packet)
	Therefore, it would have been obvious to one of ordinary skill in the art before
the effective filing date of the claimed invention to combine the teachings of Jackson’s Coupling Parallel Event-Driven Computation into Ref. Chiueh’s One Dimensional Systolic Array Architecture For Neural Network as modified by Ref. Ross’s Neural Network Processor, with a motivation to ensure efficiency of the system by keep track of source of each output (Para. [0050]). 
Claims 9 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Ref.
Chiueh, in view of Ref. Ross and Ref. Jackson as shown above, further in view of  US 20180189648 A1 by Sengupta et al. (hereinafter, Ref. Sengupta).
As per claim 9, combination of Ref. Chiueh, Ref. Ross and Ref. Jackson as shown above teaches the device of Claim 8.
Ref. Ross teaches wherein the second processing unit includes second processing
circuitry configured to receive the first activation output and perform processing according to the second node to generate a second activation output (Ref. Ross Fig 1(106,108), Fig 5(502) + col 1(line31-46), col 7(line 17 -27) disclose matrix computation unit receives weight input and activation inputs (i.e. the first activation output) from unified buffer and generate (i.e. perform processing according to the second node) accumulated value. Vector computation unit receives accumulated values and apply activation function to generate (i.e. perform processing according to the second node) activation values (i.e. second activation output). Here, matrix computation unit and vector computation unit (i.e. second processing circuitry) together process the inputs to generate activation output)
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Chiueh’s One Dimensional Systolic Array Architecture For Neural Network into Ref. Ross’s Neural Network Processor, with a motivation to provide the output from one layer to another layer at appropriate times, thereby ensuring accurate calculating without unnecessary delay. (Ref. Ross col 4(line 34-50)).  
	Combination of Ref. Chiueh, Ref. Ross and Ref. Jackson fails to explicitly teach:
	and wherein the second processing unit uses the tag to identify a weight to use for processing the first activation output.
	However Ref. Sengupta teaches:
	and wherein the second processing unit uses the tag to identify a weight to use for processing the first activation output (Ref. Sengupta fig 3 and para. [0042, 0089] disclose A spike (i.e. first activation output) from X1 to X5 (i.e. the second processing unit) uses the synapse weight W15 (i.e. weight) from a synapse weight array. 
Synapse weight array can be indexed based on the address of the pre-synapse neural unit (i.e. the tag (which identifies that the first activation output was computed by the first processing unit)) and the address of the post-synapse neural unit.)
	Therefore, it would have been obvious to one of ordinary skill in the art before
the effective filing date of the claimed invention to combine the teachings of Ref. Sengupta’s Event Driven And Time Hopping Neural Network into Ref. Chiueh’s One Dimensional Systolic Array Architecture For Neural Network as modified by Ref. Ross’s Neural Network Processor and Jackson’s Coupling Parallel Event-Driven Computation, with a motivation to identify an appropriate weight from a plurality of weights (based on the address of the node), thereby ensuring that the weight indicates the appropriate strength of the relationship between the nodes and increasing the efficiency of training. (Para. [0042, 0089])

As per claim 20, combination of Ref. Chiueh, Ref. Ross and Ref. Jackson as shown above teaches the non-transitory computer-readable medium of Claim 19.
Combination of Ref. Chiueh, Ref. Ross and Ref. Jackson fails to explicitly teach:
	the operations further comprising, by the second processing unit, identifying a weight with which to multiply the first activation output based on the origin address.
	However Ref. Sengupta teaches:
	the operations further comprising, by the second processing unit, identifying a weight with which to multiply the first activation output based on the origin address. (Ref. Sengupta fig 3 and para. [0042, 0089] disclose a spike (i.e. first activation output) from X1 to X5 (i.e. the second processing unit) uses the synapse weight W15 (i.e. weight with which to multiply) from a synapse weight array. Synapse weight array can be indexed based on the address of the pre-synapse neural unit (i.e. the origin address (which identifies that the first activation output was computed by the first processing unit)) and the address of the post-synapse neural unit.)
	Therefore, it would have been obvious to one of ordinary skill in the art before
the effective filing date of the claimed invention to combine the teachings of Ref. Sengupta’s Event Driven And Time Hopping Neural Network into Ref. Chiueh’s One Dimensional Systolic Array Architecture For Neural Network as modified by Ref. Ross’s Neural Network Processor and Jackson’s Coupling Parallel Event-Driven Computation, with a motivation to identify an appropriate weight from a plurality of weights (based on the address of the node), thereby ensuring that the weight indicates the appropriate strength of the relationship between the nodes and increasing the efficiency of training. (Para. [0042, 0089])

Claims 17 is rejected under 35 U.S.C. 103 as being unpatentable over Ref. Chiueh, in view of Ref. Ross as shown above, further in view of  US 20180189648 A1   by Sengupta et al. (hereinafter, Ref. Sengupta).
As per claim 17, combination of Ref. Chiueh and Ref. Ross as shown above teaches the method of Claim 16.
Combination of Ref. Chiueh, Ref. Ross and Ref. Jackson fails to explicitly teach:
identifying the weight from among a plurality of weights based on information indicative of an origin address of the first activation output
However Ref. Sengupta teaches:
identifying the weight from among a plurality of weights based on information indicative of an origin address of the first activation output (Ref. Sengupta fig 3 and para. [0042, 0089] disclose a spike (i.e. first activation output) from X1 to X5 uses the synapse weight W15 (i.e. identifying the weight) from a synapse weight array (i.e. a plurality of weights). Synapse weight array can be indexed based on the address of the pre-synapse neural unit (i.e. information indicative of an origin address of the first activation output (which identifies that the first activation output was computed by the first processing unit)) and the address of the post-synapse neural unit.)
Therefore, it would have been obvious to one of ordinary skill in the art before
the effective filing date of the claimed invention to combine the teachings of Ref. Sengupta’s Event Driven And Time Hopping Neural Network into Ref. Chiueh’s One Dimensional Systolic Array Architecture For Neural Network as modified by Ref. Ross’s Neural Network Processor and Jackson’s Coupling Parallel Event-Driven Computation, with a motivation to identify an 


Response to Amendment
Applicant’s arguments filed on 03/12/2021, (“Remarks”) have been fully considered but they are not persuasive. 

Regarding rejections under 35 USC 112(b), 
Applicant remarks that ““the second processing unit" of Claims 9 and 20 does not
invoke interpretation under § 112(f), as discussed in more detail below. Even if the second processing unit of Claims 9 and 20 were seen to invoke § 112(f), which Applicant does not concede, at least paragraph [0217] of the specification with reference to Figure 5 provides an example algorithm for the functions performed by the second processing unit in Claims 9 and 20. Accordingly, the second processing unit of Claims 9 and 20 would be sufficiently definite under§ l 12(b) if interpreted as a computer-implemented means-plus-function limitation under § 112(f). “  

Examiner has the read the least paragraph [0217] of the specification with reference to Figure 5. But the examiner respectfully disagrees that the paragraph [0217] with reference to Figure 5 provides any algorithm how the second processing unit uses the tag to identify weight 

	Regarding claim interpretation under 35 USC § 112(f), 
Applicant remarks that “"processing unit," "output systolic element," and "data processing unit (DPU)," as recited in the claims, should not be interpreted under § 112(f) because these terms denote structure to a person having ordinary skill in the art when reading the specification.” 

As explained in MPEP § 2181, subsection II “In cases involving a special purpose computer-implemented means-plus-function limitation, the Federal Circuit has consistently required that the structure be more than simply a general purpose computer or microprocessor and that the specification must disclose an algorithm for performing the claimed function. See, e.g., Noah Systems Inc. v. Intuit Inc., 675 F.3d 1302, 1312, 102 USPQ2d 1410, 1417 (Fed. Cir. 2012); Aristocrat, 521 F.3d at 1333, 86 USPQ2d at 1239.”  

In the present claims, the terms "processing unit," "output systolic element," and "data


Regarding rejections under 35 USC § 103, 
Applicant remarks that the combination of Chiueh and Ross doesn’t meet claim 1, 10 and 18 because the applied art is not seen to disclose or suggest systolically pulsing a first activation output from a first processing unit configured to perform computations of a first layer of a neural network to a second processing unit configured to perform computations of a second layer of the neural network.  
Regarding this argument, “first output systolic element” in claim 1 invokes 112f (see the analysis above) and examiner is interpreting systolic pulses from processing unit (input and output systolic element are included in the processing unit) as hardware processor (para. 0045) programmed to transfer data packets through nodes/layers in intervals, as described in para. [0009, 0205] and fig 4A – 4C.
Examiner is using Ross to teach - “systolically pulsing” which is transferring data packets through layers/nodes using clock cycle. Ross teaches that the output from one layer can be 
Examiner is using Chiueh to teach – “first processing unit configured to perform computations of a first layer of a neural network” and “a second processing unit configured to perform computations of a second layer of the neural network”. Regarding ““first processing unit configured to perform computations of a first layer of a neural network”, see Chiueh Fig. 1, 3 and Col 1 (line 15-25), Col 3 (line 5-12), which disclose a system comprising M processing elements to implement a multi-layer neural network. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weight. Each processing element is known as a Neuron. So a 1st processing element (i.e. a first processing unit of the first layer of the 1st layer of the neural network) can generate an activation output (i.e. perform computation) of the 1st node of the 1st layer using an input, a weighted sum and an activation function. Regarding “a second processing unit configured to perform computations of a second layer of the neural network”, see Chiueh Fig. 1, 3 and Col 1 (line 15-25), Col 3 (line 5-12), which disclose a system comprising M processing elements to implement a multi-layer neural network. The ith processing element where i=1, 2, 3……M, comprises a weight storage circuit for storing a sequence of synaptic weight. Each processing element is known as Neurons. So a 2nd processing element (i.e. a second processing unit of the second 

Applicant remarks that “Chiueh provides an example where a neural network includes M=8 neurons arranged in one layer (i.e., in one row of Figure 1) with M=8 PEs for N=8 weights. See id at col. 5, lines 23-42.….In other words, the PEs of systolic array 100 in Chiueh perform computations for one layer at a time or for one node at a time, rather than systolically pulsing an activation output from one PE to another PE in systolic array 100.” 
Regarding this argument, Chiueh col. 5, lines 23-42 provides an example where a neural network includes M=8 neurons arranged in one layer. But see Chiueh Col 1 (line 15-25) – “portion of a neural network is shown in FIG. 1. The neural network 10 of FIG. 1 comprises a plurality of neurons 12. The neurons 12 are interconnected by the synapses 14. In general, the output of any one particular neuron 12 is connected by a synapse 14 to the input of one or more other neurons 12 as shown in FIG. 1. Illustratively, the neurons 12 are arranged in layers. Three layers 15, 16 and 17 are shown in FIG. 1. However, an arrangement of neurons into layers is not necessary.” This reasonably teaches that the Chiueh can implement a multi-layer neural network. Regarding “systolically pulsing an activation output from one PE to another PE”, examiner is not using Chiueh to teach “systolic pulsing”. Rather examiner is using Ross to teach - “systolically pulsing” which is transferring data packets through layers/nodes using clock cycle. Ross teaches that the output from one layer can be provided as input to another layer in 

Applicant remarks that “In discussing the rejection of Claim 1, page 13 of the Office Action relies on Figure 2 and columns 3 to 6 of Ross for teaching a first output systolic element configured to systolically pulse a first activation output to a second input systolic element. However, Ross is not seen to teach that its matrix computation unit 212 in Figure 2 systolically pulses a first activation output from a first processing unit configured to perform computations of a first layer of a neural network to a second processing unit configured to perform computations of a second layer of the neural network”. Applicant also argues that accordingly Ross is not seen to disclose systolically pulsing a first activation output from a first processing unit configured to perform computations of a first layer of a neural network to a second processing unit configured to perform computations of a second layer of the neural network, as recited in independent Claims 1, 10, and 18.
	Regarding this argument, claim 1 recites “first processing circuitry configured to receive data from the first input systolic element and perform processing according to the first node to generate a first activation output”. In Ross, the matrix computation and vector processing units are together working as the first processing circuitry. Here, the matrix processing unit is 

See also, (Ref. Ross fig. 2 and Col 3(Iine 36-44), col 4(Iine34-50), col 5 (line 65) - col 6(Iine 4), which disclose that an output (i.e. first activation output) from a layer (i.e. first output systolic element) can be provided as input to another layer (i.e. the second input systolic element) in intervals using the clock signal (i.e. systolically pulse)).
	
Applicant remarks that “with respect to independent Claim 21, Applicant respectfully submits that the applied art is not seen to disclose or suggest systolically pulsing data from a first arrangement of first processing units to a second arrangement of second processing units, wherein at least a subset of the first processing units is assigned to perform computations of a first layer of a neural network and at least a subset of the second processing units is assigned to perform computations of a second layer of the neural network”
Regarding this argument, Examiner would make essentially the same reply as the previous argument above. 

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RAIA N M AZAD whose telephone number is (571)272-8232.  The examiner can normally be reached on 8.30 -5.30 (Mon -Thurs and 2nd Fri of the Pay week).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/R.M.A/
Patent Examiner, Art Unit 2125

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125