DETAILED ACTION
This office action is in response to submission of application on 5/16/2018. 
Claims 1-25 are presented for examination.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Applicant’s claims for the benefit of prior-filed U.S. Provisional Patent application 62/628,076 filed on 2/08/2018 and U.S. Provisional Patent application 62/627,957 filed on 2/08/2018, are acknowledged and admitted.  Receipt is acknowledged of papers submitted under 35 U.S.C. 119(a)-(d), which papers have been placed of record in the file. 

Information Disclosure Statement
The information disclosure statements submitted on 12/27/2018, 4/25/2019, 4/14/2020, 4/15/2020, 8/06/2020, 2/01/2021, 3/29/2021, 3/31/2021 and 5/03/2021 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are considered by the examiner.

Drawings
The Drawings filed on 5/16/2018 are acceptable for examination purposes.

Specification
The Specification filed on 5/16/2018 is acceptable for examination purposes.


Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 

Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitations use a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are: 
"the first subset of the first processing units are configured to process inputs..." in claim 1 lines 20-22.
“the subset of the second processing units… are configured to process received data…” in claim 1 lines 23-25.

Examiner has interpreted the data processing units (DPUs) in a systolic array in specification [0185], [0201]-[0202] and figure 3 as corresponding processing units. 
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-8, 11-19 are rejected under 35 U.S.C. 103 as being unpatentable over Dasari et al. (US 20190156187 A1, hereinafter Dasari) in view of Ross et al. (US 20160342893 A1, hereinafter Ross).

Regarding claim 1, 
Dasari discloses a device performing computations of a neural network comprising at least first, second, and third layers, the device comprising (FIG. 1A illustrates a system 100 for processing computational tasks of neural networks, according to an illustrative implementation. The system 100 includes a main processing unit 101 and an artificial intelligence processing unit (AIPU) 102. The system 100 is housed within a host computing device (not shown).” AIPU is interpreted as a device.): 
an array of processing units including at least (Dasari fig. 1A element 102 (AIPU) & 103a-f (Artificial Intelligence Processing Die (AIPD)) and [0035] recites, in part, “Additional details of AIPDs 103 and the arrangement of multiple identical AIPDs 103 within an AIPU 102 are described below with reference to FIGS. 1B, 2A, 2B.” Array of processing units (i.e. fig. 1A elements 103a-f)): 
a first arrangement of first processing units (Dasari fig. 2A and [0026] recites  “FIGS. 2A, 2B, 2C, and 2D illustrate example arrangements of artificial intelligence processing dies of an artificial intelligence processing unit…” Fig 2A elements 103a-c (i.e. first arrangement of first processing units), and 
a last arrangement of second processing units (Dasari fig. 2B and [0026] recites “FIGS. 2A, 2B, 2C, and 2D illustrate example arrangements of artificial intelligence processing dies of an artificial intelligence processing unit…” Fig 2A elements 103d-f (i.e. last arrangement of second processing units), 
a controller configured to assign the first and second processing units to perform computations of particular nodes of the at least first, second, and third layers of the neural network (Dasari fig. 1A-B and [0038] & [0045] recites, in part, “If the data transmitted from the main processing unit controller 105 is configuration data, then, based on the configuration data, the AIPD controller 117 is configured to select the inter-die input and output blocks to be used for communications between AIPD 103a and another AIPD 103… the AIPD controller 117 is configured to store the data related to the neural network to the buffers 119 and perform the neural network task using the input data stored in the buffers 119, and the computation unit 121. [0045] Each AIPD 103 is associated with a particular layer of the neural network…“ Examiner interprets main processing unit controller 105 sending configuration data to the controller 117 which is configured to select blocks for communication between Artificial Intelligence Processing Dies, AIPDs (fig. 1A element 103a-103f) as a controller (for clarification purposes – controller 105) configured to assign first and second processing units. Additionally, the controller 117 configured to perform neural network tasks is interpreted as computations related to nodes of a multilayer neural network); 
wherein: 
the controller is configured to assign at least a first subset of the first processing units of the first arrangement to perform computations of particular nodes of the first layer of the neural network (Dasari fig. 1B & 2A and [0040] recites, in part, “The configuration data received from the main processing unit controller 105 determines the inter-die communication paths between two AIPDs 103. For example, if the configuration data received at AIPD 103a indicates that inter-die output block 111a … be used for inter-die communications, then the AIPD controller 117 transmits data to another AIPD 103 using the inter-die output block 111a.” Examiner interprets one of the AIPDs 103 and another AIPD communicating with each other as a first subset (e.g. 103a-c communications via 201a-b in fig. 2A)), and 
to assign at least a subset of the second processing units of the last arrangement to perform computations of particular nodes of the second layer of the neural network on the first activation output values (Dasari fig. 1B & 2B and [0040] recites, in part, “…if the configuration data indicates …block 109b … used for inter-die communications, then the AIPD controller 117 selects the input block 109b as the inter-die input block for receiving data from another AIPD 103, and reads and processes the data received at the input block 109b.” Examiner interprets a different one of the AIPDs 103 and another AIPD communicating with each other as a subset (e.g. 103d, 103e, and 103b communications via 202a-c in fig. 2B) Additionally, Dasari [0038] and [0039] recites, in part, “… registers of the buffers 119 are coupled to multiple ALUs of the computation unit 121 such that they establish a systolic array… example arrangement of such a systolic array is shown in FIG. 1C. [0039] The arrangement shown in FIG. 1C also optimizes the AIPDs 103 for computations related to performing artificial intelligence tasks (referred to herein as “AI computations”), such as convolutions, matrix multiplications, pooling, element wise vector operations, and the like.” Examiner interprets matrix multiplication as computations of particular nodes for the 1st and 2nd arrangement); 
the first subset of the first processing units are configured to process inputs into the neural network to generate first activation output values that are systolically pulsed through the array (Dasari fig. 1B elements 121 & 123 and [0039] and [0040] recites, in part, “… registers of the buffers 119 are coupled to multiple ALUs of the computation unit 121 such that they establish a systolic array… example arrangement of such a systolic array is shown in FIG. 1C.… the computation unit 121 performs AI computations using input data and weights selected for the neural network … 121 includes an activation unit 123… 123 can include multiple ALUs and multiple shift registers and can be configured to apply activation functions and non-linear functions to the results of the AI computations.” Examiner interprets the computation units in the AIPDs with ALUs in a systolic array establishment performing AI computations with input data and weights with the activation unit as generating activation output values that are systolically pulsed.); 
the subset of the second processing units of the last arrangement are configured to process received data to generate second activation output values and send the second activation output values to the memory (Dasari [0040] recites, in part, “…the computation unit 121 performs AI computations using input data and weights selected for the neural network … 121 includes an activation unit 123… 123 can include multiple ALUs… The computation unit 121 transmits the data resulting after applying the activation functions and/or other non-linear functions to the buffers 119 to store the data.” Examiner interprets the ALUs as part of the systolic array in the computation unit in the AIPD where each AIPD is associated with a layer of the neural network.);
the controller is further configured to re-assign at least a second subset of the first processing units of the first arrangement to perform computations of particular nodes of the third layer of the neural network (Dasari [0051] recites, in part, “…modifying the configuration data associated with an AIPU and/or the configuration data associated with the AIPDs of that AIPU, a single AIPU can be utilized to process different neural networks. For example, in FIG. 2B, if a neural network with four layers is to be processed by the AIPU 250, then … the configuration data associated with the AIPDs 103 … can be modified to associate the AIPD 103a with the first layer of the neural network, the AIPD 103b with the second layer of the neural network, the AIPD 103c with the third layer of the neural network, and the AIPD 103f with the fourth layer of the neural network.” Examiner interprets the modifying of the configuration data of AIPDs, 103a-f, by the controller to be design choices depending on the network (e.g. fig 2A & 2B where 2nd layer in 2A is 103b compared to 2nd layer in 2B is 103d) and as assigning and re-assigning processing units. Examiner interprets if starting from fig. 2B elements 103a-c as a 1st subset of 1st arrangement of processing units, then changing to fig. 2A elements 103b-c as 2nd subset of 1st arrangement of processing units, allows 103c to be assigned as a 3rd layer as depicted in fig. 2A opposed to a 5th layer as depicted in 2B.); and 
-71-the second subset of the first processing units are configured to receive the second activation output values from the memory and process the second activation output values according to the computations of the particular nodes of the third layer of the neural network (Dasari [0051] recites, in part, “The AIPD 103b transmits the result data to the AIPD 103c using the inter-die output block 223a. The AIPD 103c performs computations related to the third layer of the neural network, including AI computations, based on the result data received from the AIPD 103b at the inter-die input block 225a” Examiner interprets the AIPDs with buffers transmitting result data after performing respective AI computations to other receiving AIPDs for additional AI computations as processing units configured to receive activation output values from memory and processing activation output values.).  
However, Dasari does not explicitly disclose wherein data is systolically pulsed from arrangement to arrangement of the array; a memory configured to store activation output values received from the last arrangement.
Ross teaches:
wherein data is systolically pulsed from arrangement to arrangement of the array (Ross fig. 8 and [0066] recites, in part, “FIG. 8 shows an example illustration 800 of weight inputs inside cells of an example 3×3 systolic array after three clock cycles. Each cell can store a weight input and an activation input... After every clock cycle, weight inputs can be shifted in one dimension, e.g., from top to bottom, while activation inputs can be shifted (not illustrated) in another dimension, e.g., from left to right.” Inputs (i.e. data) shifted top to bottom and left to right after every clock cycle (i.e. systolically pulsed from arrangement to arrangement).); 
a memory configured to store activation output values received from the last arrangement (Ross [0033] and [0038] recites, in part, “The system can generate a layer output … using a vector computation unit of the special-purpose hardware circuit... The output of the layer can be stored in the unified buffer for use as an input to a subsequent layer in the neural network...  [0038] The unified buffer 308 is a memory buffer. It can be used to store the set of activation inputs from the direct memory access engine 304 and outputs of the vector computation unit 314.” Examiner interprets the memory buffer as a memory storing outputs from every layer of the neural network to include the layer(s) in the last arrangement.).
Dasari and Ross are both directed to neural network hardware implementations and architecture. In view of the teachings of Ross, it would have been obvious to one of ordinary skill in the art to apply the teachings of Ross to Dasari before the effective filing date of the claimed invention in order to compute convolution calculations in parallel using a two-dimensional systolic array by rotating activation inputs and weight inputs to a neural network processor thereby improving Dasari (cf. Ross [0010] recites “Rotating activation inputs and weight inputs to a neural network processor can cause the hardware circuit to process inferences for neural networks having convolutional layers more efficiently. In particular, multiple convolution calculations can be performed in parallel. This allows a systolic array within the neural network process to be more fully utilized during each clock cycle. Rotations can also permit better usage of registers in processors, e.g., CPUs and GPUs, which can improve performance. Rotations can also reduce power usage by reducing a number of cache fetches.”).

Regarding claim 2,
The Dasari/Ross Combination teaches the device of Claim 1, wherein the first and second subsets of the first processing units includes one or more processing units also included in the second subset of the first processing units (Dasari [0051] and [0055] recites “…by modifying the configuration data associated … with the AIPDs of that AIPU, a single AIPU can be utilized to process different neural networks... in FIG. 2B, if a neural network with four layers is to be processed by the AIPU 250, then the configuration data … associated with the AIPDs 103 … can be modified to associate the AIPD 103a with the first layer of the neural network, the AIPD 103b with the second layer of the neural network, the AIPD 103c with the third layer of the neural network, and the AIPD 103f with the fourth layer of the neural network. [0055] The number of AIPDs within a single AIPU package is only limited by the size of the AIPU … in a single AIPU package an N×N arrangement of the AIPDs can be included, as shown by the arrangement of AIPD 11 through AIPD NN in FIG. 2D. AIPD 11 through AIPD NN of FIG. 2D are similarly designed and configured as the AIPDs 103 described above.” Examiner interprets the arrangement in fig. 2D elements 11,21,31,..,N1 in view of fig. 2A elements 103a-c as having the 1st subset 103a-c (i.e. fig. 2D elements 11,21,31) and 2nd subset 103b-c (i.e. fig. 2D elements 21,31) seen as 1st row in fig. 2D elements 11,21,31,…,N1).  
Please see motivation for claim 1 above.

Regarding claim 3,
The Dasari/Ross Combination teaches the device of Claim 1, wherein one of the first and second subsets of the first processing units includes one or more processing units not included in the other of the first and second subsets of the first processing units (Dasari [0051] and [0055] recites “…by modifying the configuration data associated … with the AIPDs of that AIPU, a single AIPU can be utilized to process different neural networks... in FIG. 2B, if a neural network with four layers is to be processed by the AIPU 250, then the configuration data … associated with the AIPDs 103 … can be modified to associate the AIPD 103a with the first layer of the neural network, the AIPD 103b with the second layer of the neural network, the AIPD 103c with the third layer of the neural network, and the AIPD 103f with the Examiner interprets the arrangement in fig. 2D elements 11,21,31,..,N1 in view of fig. 2A elements 103a-c as having the 1st subset 103a-c (i.e. fig. 2D elements 11,21,31) and 2nd subset 103b-c (i.e. fig. 2D elements 21,31) seen as 1st row in fig. 2D elements 11,21,31,…,N1 excluding processing elements N1 (i.e. other of the first and second subsets of the first processing units)).  
Please see motivation for claim 1 above.

Regarding claim 4,
The Dasari/Ross Combination teaches the device of Claim 1, wherein the neural network comprises a number of layers between the first and second layers, wherein the array comprises a number of arrangements between the first and last arrangements, and wherein the number of layers equals the number of arrangements (Dasari fig. 2D and [0035] & [0055] recites, in part, “… each AIPD 103 is associated with at least one layer of the neural network... [0055] The number of AIPDs … is only limited by the size of the AIPU package... Therefore, in a single AIPU package an N×N arrangement of the AIPDs can be included, as shown by the arrangement … in FIG. 2D.” Examiner interprets the columns of AIPDs from elements 21-2N and 31-3N in fig. 2D as a number of layers between first (i.e. fig. 2D column of elements 11-1N) and second (i.e. fig. 2D column of elements N1-NN) layers). Examiner interprets the rows of AIPDs from elements 12-N2 and 13-N3 as a number of arrangements between the first (i.e. fig. 2D row elements 11-N1) and last (i.e. fig. 2D row elements 1N-NN) arrangements.)).  
Please see motivation for claim 1 above.

Regarding claim 5,
The Dasari/Ross Combination teaches the device of Claim 1, wherein the device includes a systolic processor chip, and wherein each of the first and last arrangements of processing units comprise circuitry embedded in the systolic processor chip (Dasari [0040] and [0081] recites, in part, “Each inter-die input and output blocks of an AIPD 103 includes multiple pins. Pins of an inter-die output block of an AIPD 103 can be connected by electrical interconnects to a corresponding pin of an inter-die input block of another AIPD 103. [0081] The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).” Examiner interprets the ASIC as a systolic processor chip and AIPDs with electrical interconnections with pins as embedded circuitry).  
Please see motivation for claim 1 above.

Regarding claim 6,
The Dasari/Ross Combination teaches the device of Claim 1, wherein the computations of a particular processing unit of the first subset of the first processing units include a multiplication of input data with a weight (Dasari [0038]-[0040] recites, in part, “The computation unit 121 includes multiple multiply-accumulator units (MACs) (not shown), multiple Arithmetic Logic Units (ALUs) (not shown) …  FIG. 1C also optimizes the AIPDs 103 for computations related to performing artificial intelligence tasks (referred to herein as “AI computations”), such as convolutions, matrix multiplications … the computation unit 121 performs AI computations using input data and weights selected for the neural network… the computation unit 121 includes an activation unit 123”. Examiner interprets the MACs in the computation unit of the AIPD using input data and weights for AI computations for the neural network as processing units including a multiplication of input data with a weight.).  
Please see motivation for claim 1 above.

Regarding claim 7,
The Dasari/Ross Combination teaches the device of Claim 6, wherein the weight is stored locally at the particular processing unit of the first processing units (Dasari fig. 1B Examiner interprets the weight data stored in buffers which include memory such as registers or other types of integrated circuit memory that are a part of each AIPD as weights stored locally at the processing unit).  
Please see motivation for claim 1 above.

Regarding claim 8,
The Dasari/Ross Combination teaches the device of Claim 6, wherein the weight is retrieved from a memory external to the particular processing unit of the first subset of the first processing units (Dasari fig. 1A element 107 and [0032] & [0057]-[0058] recites, in part, “The memory 107 stores configuration data ... [0057] At each AIPD 103, the method 300 includes receiving configuration data… [0058] Different configuration data may specify different values to configure an AIPD 103 for neural network processing including … neural network related data, such as parameters, parameter weight data, number of parameters. The values specified by the configuration data are based on the layer of the neural network with which the corresponding AIPD 103 is associated.” Examiner interprets memory 107 in fig. 1A as memory external to the processing unit.).  
Please see motivation for claim 1 above.

Regarding claim 11, 	
Dasari discloses a method for performing computations of a neural network comprising at least first, second, and third layers, via an array of processing units including at least a first arrangement of first processing units and a last arrangement of second processing units, the method comprising (Dasari fig. 1A & 4 and [0027] & [0035] Fig. 1A element 102 (i.e. array of processing units with 1st and last arrangement)): 
assigning at least a first subset of the first processing units of the first arrangement to the first layer of the neural network (Dasari fig. 2A and [0042] recites, in part, “While the connections between an inter-die output block of one AIPD 103 and the inter-die input block of another AIPD 103 are connected by electrical interconnects, the selection of a particular inter-die output block of an AIPD 103 and transmission of a particular signal or data to a particular pin of the inter-die output block may be programmable or modified based on the configuration data received by the AIPD 103 from the main processing controller 105. Through the selection of different output blocks of the AIPDs 103, the AIPU 102 can be configured to implement different requirements of different neural networks including, but not limited to, feedback loops between different layers of a neural network.” Examiner interprets the configuration data used by AIPDs for arrangements such as fig. 2A to assign a 1st subset of the 1st processing units of the 1st arrangement to the 1st layer of a neural network.); 
assigning at least a subset of the second processing units of the last arrangement to the second layer of the neural network (Dasari fig. 1B & 2B and [0040] recites, in part, “…if the configuration data indicates …block 109b … used for inter-die communications, then the AIPD controller 117 selects the input block 109b as the inter-die input block for receiving data from another AIPD 103, and reads and processes the data received at the input block 109b.” Examiner interprets a different one of the AIPDs 103 and another AIPD communicating with each other as a subset (e.g. 103d, 103e, and 103b communications via 202a-c in fig. 2B)); 
receiving input data for the neural network (Dasari fig. 4 element 406 and [0064] recites “At the AIPU, the method 400 includes receiving initial data related to the neural network at a first AIPD 103 associated with the input layer of the neural network (stage 406).”); 
performing computations of particular nodes of the first layer of the neural network values by using the first subset of the first processing units to process the input data to generate first activation output values (Dasari fig. 4 element 408 and [0064] recites “The method 400 includes, at the first AIPD 103, performing computations related to the layer of the neural network associated with the first AIPD 103 using the initial data and any neural network related data received with the configuration data of the first AIPD 103 (stage 408).”); 
performing computations of particular nodes of the second layer of the neural network using the second subset of the second processing units to generate second activation output values (Dasari fig. 4 element 412 and [0064] recites “The method 400 includes, at the second AIPD 103, performing computations related to the layer of the neural network associated with the second AIPD 103 using the result data received from the first AIPD (stage 412).”); 
re-assigning at least a second subset of the first processing units of the first arrangement to the third layer of the neural network (Dasari [0051] recites, in part, “… by modifying the configuration data … associated with the AIPDs of that AIPU, a single AIPU can be utilized to process different neural networks… FIG. 2B, if a neural network with four layers is to be processed by the AIPU 250, then the configuration data associated … with the AIPDs 103 … can be modified to associate the AIPD 103a with the first layer of the neural network, the AIPD 103b with the second layer of the neural network, the AIPD 103c with the third layer of the neural network, and the AIPD 103f with the fourth layer of the neural network.” Examiner interprets modifying the configuration data for the AIPDs to associate different layers of a neural network as re-assigning processing units to include a 2nd subset such as in fig. 2B with 103c in a 1st arrangement re-assigned to a 3rd layer. Examiner interprets the original configuration in fig. 2B with element 103c being associated with the 5th layer); 
accessing, by the second subset of the first processing units, the second activation output values from the memory (Dasari [0040] recites, in part, “The AIPD Examiner interprets the transmitting of output data from the computation unit which includes the activation unit 123 and buffers 119 as depicted in fig. 1B as accessing from memory); and 
performing computations of particular nodes of the third layer of the neural network using the second subset of the first processing units (Dasari [0069] recites “In implementations where the neural network model being processed by the AIPU includes a feedback loop between two or more layers of the neural network and the second AIPD 103 and the first AIPD 103 are associated with the layers of the neural network between which the feedback loop is included, then the method 400 includes, at the second AIPD 103, transmitting result data from the computations at the second AIPD 103 as feedback to the first AIPD 103 (stage 414).” Examiner interprets the additional computations performed by the 1st AIPD with feedback data from the 2nd AIPD computation as performing computations of the 3rd layer using a subset of the 1st processing unit. Examiner’s perspective AIPD 1 (1st layer) to AIPD 2 (2nd layer) back to AIPD 1 (3rd layer)).  
However, Dasari does not explicitly disclose sending the second activation output values to a memory.
Ross teaches sending the second activation output values to a memory (Ross [0025] and [0038] recite, in part, “Data inputs to a neural network layer, e.g., either the input to the neural network or the outputs of the layer below the layer in the sequence, can be referred to as activation inputs to the layer. Activation inputs can be represented as a matrix structure of activation values. [0037] The host interface 302 can send the sets of weight inputs and the initial set of activation inputs to the direct memory access engine 304. The direct memory access engine 304 can store the sets of activation inputs at the unified buffer 308.” Examiner interprets the buffer as a memory storing activation output values.); 



Regarding claim 12, 
The Dasari/Ross Combination teaches the method of Claim 11, wherein the re-assigning comprises assigning one or more processing units of the first subset to the third layer of the neural network (Dasari [0051] recites, in part, “…modifying the configuration data associated with an AIPU and/or the configuration data associated with the AIPDs of that AIPU, a single AIPU can be utilized to process different neural networks. For example, in FIG. 2B, if a neural network with four layers is to be processed by the AIPU 250, then … the configuration data associated with the AIPDs 103 … can be modified to associate the AIPD 103a with the first layer of the neural network, the AIPD 103b with the second layer of the neural network, the AIPD 103c with the third layer of the neural network, and the AIPD 103f with the fourth layer of the neural network.” Examiner interprets the modifying of the configuration data of AIPDs, 103a-f, by the controller to be design choices depending on the network (e.g. fig 2A & 2B where 2nd layer in 2A is 103b compared to 2nd layer in 2B is 103d) and as assigning and re-assigning processing units. Examiner interprets if starting from fig. 2B elements 103a-c as a 1st subset of 1st arrangement of processing units, then changing to fig. 2A elements 103b-c as 2nd subset of 1st arrangement of processing units, allows 103c to be assigned as a 3rd layer as depicted in fig. 2A opposed to a 5th layer as depicted in 2B.).
Please see motivation for claim 11 above.
  
Regarding claim 13,
The Dasari/Ross Combination teaches the method of Claim 11, wherein the re-assigning comprises assigning one or more processing units not in the first subset to the third layer of the neural network (Dasari [0051] recites, in part, “…modifying the configuration data associated with an AIPU and/or the configuration data associated with the AIPDs of that AIPU, a single AIPU can be utilized to process different neural networks. For example, in FIG. 2B, if a neural network with four layers is to be processed by the AIPU 250, then … the configuration data associated with the AIPDs 103 … can be modified to associate the AIPD 103a with the first layer of the neural network, the AIPD 103b with the second layer of the neural network, the AIPD 103c with the third layer of the neural network, and the AIPD 103f with the fourth layer of the neural network.” Examiner interprets the modifying of the configuration data of AIPDs, 103a-f, by the controller to be design choices depending on the network (e.g. fig 2A & 2B where 2nd layer in 2A is 103b compared to 2nd layer in 2B is 103d) and as assigning and re-assigning processing units. Examiner interprets if starting from fig. 2A elements 103a-c as a 1st subset of 1st arrangement of processing units, then changing to fig. 2B elements 103d-f allows 103e to be assigned as a 3rd layer as depicted in fig. 2B opposed to a 5th layer as depicted in 2A.).  
Please see motivation for claim 11 above.

Regarding claim 14, 
the method of Claim 11, wherein the neural network comprises a number of layers between the first and second layers, wherein the array comprises a number of -73-arrangements between the first and last arrangements, and wherein the number of layers equals the number of arrangements (Dasari fig. 2D and [0035] & [0055] recites, in part, “… each AIPD 103 is associated with at least one layer of the neural network... [0055] The number of AIPDs … is only limited by the size of the AIPU package... Therefore, in a single AIPU package an N×N arrangement of the AIPDs can be included, as shown by the arrangement … in FIG. 2D.” Examiner interprets the columns of AIPDs from elements 21-2N and 31-3N in fig. 2D as a number of layers between first (i.e. fig. 2D column of elements 11-1N) and second (i.e. fig. 2D column of elements N1-NN) layers). Examiner interprets the rows of AIPDs from elements 12-N2 and 13-N3 as a number of arrangements between the first (i.e. fig. 2D row elements 11-N1) and last (i.e. fig. 2D row elements 1N-NN) arrangements.). Examiner interprets the NxN rows (i.e. arrangements) and columns (i.e. layers) equaling in number), 
the method further comprising assigning the number of arrangements to corresponding ones of the number of layers (Dasari [0043] recites, in part, “… each AIPD 103 … is associated with at least one layer of a neural network that the AIPU 102 is configured to process... The configuration data is associated with a neural network model that is selected to be processed by the AIPU. The configuration data specifies the associations between an AIPD 103 and a layer of the neural network being processed by the AIPU.”).  
Please see motivation for claim 11 above.

Regarding claim 15, 
The Dasari/Ross Combination teaches the method of Claim 14, further comprising: 
systolically pulsing the first activation outputs from the first arrangement to an adjacent arrangement of the number of arrangements (Ross fig. 8 element 802 and [0025] & Activation input (i.e. activation output) shifted top to bottom and left to right after every clock cycle (i.e. systolically pulsed from arrangement to arrangement)); 
at each of the number of arrangements, generating an additional set of activation outputs and pulsing the additional set of activation outputs to a next adjacent arrangement (Ross fig. 8 element 804 and [0066] recites, in part, “After every clock cycle, weight inputs can be shifted in one dimension, e.g., from top to bottom, while activation inputs can be shifted (not illustrated) in another dimension, e.g., from left to right.” Examiner interprets values occupying previously unfilled adjacent rows and columns (e.g. values 1 and 3) at clock cycle 2 as additional set of activation outputs in a next adjacent arrangement); and 
at the last arrangement, receiving the additional set of activation outputs systolically pulsed from one of the number of arrangements adjacent to the last arrangement (Ross fig. 8 element 806 and [0066] recites, in part, “After every clock cycle, weight inputs can be shifted in one dimension, e.g., from top to bottom, while activation inputs can be shifted (not illustrated) in another dimension, e.g., from left to right.” Examiner interprets values occupying previously unfilled adjacent rows and columns (e.g. value 1) at clock cycle 3 as receiving the additional set of activation outputs in the last adjacent arrangement).  
Please see motivation for claim 11 above.

Regarding claims 16, 
	The Dasari/Ross Combination teaches a non-transitory computer-readable medium storing computer-executable instructions that, when executed by at least one processor, cause the at least one processor to perform operations comprising (Dasari [0078] and fig. 5 element 550 (processor(s)) recites, in part, “Implementations of… the operations described… can be implemented as… one or more modules of computer program instructions, encoded on… computer storage media for execution by, or to control the operation of, a data processing apparatus... The computer storage medium may be tangible and non-transitory.”): 
identifying a neural network for processing by an array of processing units, the neural network including at least first, second, and third layers, and the array including at least a first arrangement of first processing units and a last arrangement of second processing units (Dasari [0054] & [0058] and fig. 2C & 3 recites, in part, “For example, a neural network with six layers… can be processed by the arrangement of the AIPDs 103 shown in FIG. 2C by associating the AIPD 103a with the first layer, the AIPD 103d with the second layer, the AIPD 103e with the third layer, the AIPD 103b with the fourth layer, the AIPD 103c with the fifth layer, and the AIPD 103f with the sixth layer of the neural network. [0058] The method 300 includes… receiving an input to configure the AIPU (stage 302). In response… selecting the AIPU configuration data for each AIPD 103 within the AIPU (stage 304).” Examiner interprets the received configuration data with neural network configurations (e.g. 6 layers) as identifying a neural network for processing. The AIPDs in the AIPU are interpreted as an array of processing units in arrangement, for example, depicted in fig. 2C with a 1st arrangement of elements 103a-c and 2nd arrangement of 103d-f.); 
assigning at least a first subset of the first processing units of the first arrangement to the first layer of the neural network (Dasari [0045] & [0049] and fig. 2B recites, in part, “Each AIPD 103 is associated with a particular layer of the neural network… [0049] In FIG. 2B… AIPDs 103a, 103d, 103b, 103c are associated with the first, second, fourth, and fifth layers of the neural network, respectively.” Examiner interprets 103a-c in fig. 2B as an example first arrangement depending on configuration data with 103a as a subset assigned as a first layer); 
assigning at least a subset of the second processing units of the last arrangement to the second layer of the neural network (Dasari [0045] & [0049] and fig. 2B recites, in part, “Each AIPD 103 is associated with a particular layer of the neural network… [0049] In FIG. 2B… AIPDs 103a, 103d, 103b, 103c are associated with the first, second, fourth, and fifth layers of the neural network, respectively.” Examiner interprets 103d-f in fig. 2B as an example last arrangement depending on configuration data with 103d as a subset assigned as a second layer”); 
providing input data for processing by the first subset of the first processing units (Dasari [0064] and fig. 4 elements 404 & 406 recites “The method 400 includes transmitting initial data or input data related to the neural network to the AIPU (stage 404). At the AIPU, the method 400 includes receiving initial data related to the neural network at a first AIPD 103 associated with the input layer of the neural network (stage 406).”); 
re-assigning at least a second subset of the first processing units of the first arrangement to the third layer of the neural network (Dasari [0051] recites, in part, “…modifying the configuration data… associated with the AIPDs of that AIPU, a single AIPU can be utilized to process different neural networks... in FIG. 2B, if a neural network with four layers is to be processed by the AIPU 250, then … the configuration data associated with the AIPDs 103 … can be modified to associate the AIPD 103a with the first layer of the neural network, the AIPD 103b with the second layer of the neural network, the AIPD 103c with the third layer of the neural network, and the AIPD 103f with the fourth layer of the neural network.” Examiner interprets the modifying of the configuration data of AIPDs, 103a-f, depending on the network (e.g. fig 2A & 2B where 2nd layer in 2A is 103b compared to 2nd layer in 2B is 103d) and as assigning and re-assigning processing units. Examiner interprets if starting from fig. 2B elements 103a-c as a 1st subset of 1st arrangement of processing units, then changing to fig. 2A elements 103b-c as 2nd subset of 1st arrangement of processing units, allows 103c to be assigned as a 3rd layer as depicted in fig. 2A opposed to a 5th layer as depicted in 2B.); and  
-74-providing the activation output values from the memory to the second subset of the first processing units (Dasari [0040] recites, in part, “…the computation unit 121 performs AI computations using input data and weights selected for the neural network … 121 includes an activation unit 123… 123 can include multiple ALUs… The computation unit 121 transmits the data resulting after applying the activation functions and/or other non-linear functions to the buffers 119 to store the data.” Examiner interprets the ALUs as part of the systolic array in the computation unit in the AIPD where each AIPD is associated with a layer of the neural network.).  
However, Dasari does not explicitly disclose storing activation output values received from the last arrangement to a memory.
Ross teaches storing activation output values received from the last arrangement to a memory (Ross [0038] recites, in part, “The unified buffer 308 is a memory buffer. It can be used to store the set of activation inputs from the direct memory access engine 304 and outputs of the vector computation unit 314.” Examiner interprets the buffer as a memory for storing activation output values.)
Dasari and Ross are both directed to neural network hardware implementations and architecture. In view of the teachings of Ross, it would have been obvious to one of ordinary skill in the art to apply the teachings of Ross to Dasari before the effective filing date of the claimed invention in order to compute convolution calculations in parallel using a two-dimensional systolic array by rotating activation inputs and weight inputs to a neural network processor thereby improving Dasari (cf. Ross [0010] recites “Rotating activation inputs and weight inputs to a neural network processor can cause the hardware circuit to process inferences for neural networks having convolutional layers more efficiently. In particular, multiple convolution calculations can be performed in parallel. This allows a systolic array within the neural network 

Regarding claims 17-18,
Claims 17-18 are directed to non-transitory computer-readable medium storing computer-executable instructions executed by at least one processor performing operations substantially identical to those recited in claims 2-3, respectively. Therefore, the rejections to claims 17-18 apply equally here.
In addition, Dasari discloses the additional limitation of non-transitory computer-readable medium, computer-executable instructions, processor and operations (Dasari fig. 5 element 550 (processor(s)) and [0078] recites, in part, “Implementations of the subject matter and the operations described in this specification … can be implemented as one or more computer programs embodied on a tangible medium, i.e., one or more modules of computer program instructions, encoded on one or more computer storage media for execution by, or to control the operation of, a data processing apparatus... The computer storage medium may be tangible and non-transitory.”).

Regarding claims 19,
The Dasari/Ross Combination teaches the non-transitory computer-readable medium of Claim 16, wherein the neural network comprises a number of layers between the first and second layers, wherein the array comprises a number of arrangements between the first and last arrangements, the operations further comprising assigning each of the number of arrangements to a corresponding one of the number of layers (Dasari fig. 2D and [0035] & [0055] recites, in part, “… each AIPD 103 is associated with at least one layer of the neural network... [0055] The number of AIPDs … is only limited by the size of the AIPU package... Therefore, in a single AIPU package an N×N arrangement of the AIPDs Examiner interprets the columns of AIPDs from elements 21-2N and 31-3N in fig. 2D as a number of layers between first (i.e. fig. 2D column of elements 11-1N) and second (i.e. fig. 2D column of elements N1-NN) layers). Examiner interprets the rows of AIPDs from elements 12-N2 and 13-N3 as a number of arrangements between the first (i.e. fig. 2D row elements 11-N1) and last (i.e. fig. 2D row elements 1N-NN) arrangements.). Examiner interprets the NxN rows (i.e. arrangements) and columns (i.e. layers) corresponding in number).  
	Please see motivation for claim 16 above.


Claims 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over Dasari in view of Ross and in further view of Ginosar et al. (US 5812993 A, hereinafter Ginosar).

Regarding claim 9, 
The Dasari/Ross Combination teaches the device of Claim 1, wherein the controller is further configured to maintain assignment of the second subset of the first arrangement of processing units to perform the computations of the particular nodes of the third layer of the neural network (Dasari fig. 4 and [0069] recites, in part, “In implementations where the neural network model being processed by the AIPU includes a feedback loop between two or more layers of the neural network and the second AIPD 103 and the first AIPD 103 are associated with the layers of the neural network between which the feedback loop is included, then the method 400 includes, at the second AIPD 103, transmitting result data from the computations at the second AIPD 103 as feedback to the first AIPD 103 (stage 414).”).  
However, The Dasari/Ross Combination does not explicitly teach during a first stage of back propagation.
during a first stage of back propagation (Ginosar Pg. 14, Col. 10, Ln. 2-14 recites, in part, “…depending on whether the chip is processing an image, being programmed, or finalizing an epoch… At chip initialization and programming time, the Input Forward Channel loads the initial weights, biases and the learning rate, and the Input Learning Channel loads the initial weights into the learning section… at the end of each epoch, the old weights are replaced by the new ones which have been learned. During that stage, the Input Learning Channel 33 receives the updated weights from the next layer.” Examiner interprets the stage where weights are updated during learning for an epoch as during a first stage of back propagation).
Dasari and Ginosar are both directed to neural networks, particularly architecture. In view of the teachings of Ginosar, it would have been obvious to one of ordinary skill in the art to apply the teachings of Ginosar to Dasari before the effective filing date of the claimed invention in order to perform forward processing and back propagation learning simultaneously by not limiting the architecture to a single channel with dataflow in one direction thereby improving Dasari (cf. Ginosar Pg. (20) #, Col. # recites, in part, “A number of digital parallel architectures for neural network processing have been described in the literature... Some of them are limited to forward processing…, while others support learning… Some architectures employ systolic data flow… As is well known, "Systolic" processing means: In systolic computation, dataflaws rhythmically from one processing unit to another each clock cycle. The multiple identical or similar processing units operate on the data simultaneously, each unit processing a different part of the data at the same time… All those architectures employ a single channel for data sharing among the neurons and across the layers, which limits processing to only one input datum at each cycle. The use of a single channel is also twice as slow learning as forward processing since data can flow in the network in only one direction. Image processing handles large amounts of data, and if real-time speed is required (30 images per second), then none of 

Regarding claim 10, 
The Dasari/Ross/Ginosar Combination teaches the device of Claim 9, wherein the controller is configured to re-assign the first subset of the first arrangement of processing units to perform the computations of the particular nodes of the first layer of the neural network (Dasari [0051] recites, in part, “…modifying the… configuration data associated with the AIPDs of that AIPU, a single AIPU can be utilized to process different neural networks. For example, in FIG. 2B, if a neural network with four layers is to be processed by the AIPU 250, then … the configuration data associated with the AIPDs 103 … can be modified to associate the AIPD 103a with the first layer of the neural network, the AIPD 103b with the second layer of the neural network, the AIPD 103c with the third layer of the neural network, and the AIPD 103f with the fourth layer of the neural network.” Examiner interprets the modifying of the configuration data of AIPDs, 103a-f, by the controller to be design choices depending on the network (e.g. fig 2A & 2B where 2nd layer in 2A is 103b compared to 2nd layer in 2B is 103d) and as assigning and re-assigning processing units. Examiner interprets if starting from fig. 2B elements 103a-c as a 1st subset of 1st arrangement of processing units, then changing to fig. 2A elements 103b-c as 2nd subset of 1st arrangement of processing units, allows 103c to be assigned as a 3rd layer as depicted in fig. 2A opposed to a 5th layer as depicted in 2B.)
during a second stage of the back propagation (Ginosar Pg. 14, Col. 10, Ln. 2-14 recites, in part, “…depending on whether the chip is processing an image, being programmed, or finalizing an epoch… At chip initialization and programming time, the Input Forward Channel loads the initial weights, biases and the learning rate, and the Input Learning Channel loads the initial weights into the learning section… at the end of each epoch, the old weights are replaced by the new ones which have been learned. During that stage, the Input Learning Channel 33 Examiner interprets the stage where weights are updated during learning for a subsequent epoch as during a second stage of back propagation).
Please see motivation for claim 9 above.


Claims 20-25 are rejected under 35 U.S.C. 103 as being unpatentable over Dasari in view of Ginosar.


Regarding claim 20,
Dasari discloses a computer-implemented method, comprising: 
for a first set of forward propagations through a first portion of the neural network, assigning a first arrangement of the number of arrangements to perform computations according to a first layer of the neural network (Dasari [0057] recites, in part, “The method 300 includes transmitting the configuration data to the AIPDs 103 of the AIPU (stage 306). Dasari fig. 2A and [0026] recites “FIGS. 2A, 2B, 2C, and 2D illustrate example arrangements of artificial intelligence processing dies of an artificial intelligence processing unit…” Fig 2A elements 103a-c (i.e. first arrangement of processing units)”); 
providing an input for the neural network to the first arrangement to initiate the first set of forward propagations (Dasari [0064] recites, in part, “The method 400 includes transmitting initial data or input data related to the neural network to the AIPU (stage 404).”); 
storing an output from the systolic processing chip in a memory (Dasari [0062] and [0081] recite, in part, “The AIPD controller of the AIPD 103 is further configured to store neural network related data, such as parameter weight data, in storage devices such as buffers 119 and utilize the neural network related data during the computations related to the layer of the neural network associated with the AIPD 103. [0081] The processes and logic flows can … be Examiner interprets storing neural network related data such as parameter weights in buffers as storing an output in a memory); 
for a second set of forward propagations through a second portion of the neural network, assigning the first arrangement to perform computations according to a different layer of the neural network (Dasari [0057] recites “The method 300 includes configuring the AIPD 103 based on the configuration data (stage 310). Dasari [0026] recites “FIGS. 2A, 2B, 2C, and 2D illustrate example arrangements of artificial intelligence processing dies of an artificial intelligence processing unit…” Fig 2A elements 103b-c (i.e. arrangement of processing units) and configuring the AIPD based on configuration data (i.e. assigning the arrangement to perform computations)”); and 
providing the output to the first arrangement to initiate the second set of forward propagations (Dasari [0068] recites “The method 400 includes, at the first AIPD 103, transmitting the result from the computations at the first AIPD 103 to a second AIPD 103 (stage 410). The second AIPD 103 is associated with a different layer of the neural network than the first AIPD.”). 
However, Dasari does not explicitly disclose determining that a number of layers of a neural network exceeds a number of arrangements of processing units of a systolic processing chip.
Ginosar teaches determining that a number of layers of a neural network exceeds a number of arrangements of processing units of a systolic processing chip (Ginosar Pg. 12, Col. 6, Ln. 37-47 recites, in part, “…a chip may constitute one layer with forward (and possibly also backward propagation) constituent. By this embodiment a neural network of n layers is realized by n distinct chips. Alternatively a single chip may encompass only portion of a layer, or if desired, more than one layer. Those versed in the art will, therefore, appreciate that the design of the chip in terms of the neural network portion that is realized is determined as Examiner interprets determining the configuration of specific neural networks related to number and portioning of layers as architecture design choices and application specific. Additionally, Ginosar fig. 2 and Pg. 13, Col. 8, Ln. 24-25 recites, in part, “…a Digital Systolic Neural Network Chip (DSNC)…” Digital Systolic Neural Network Chip, DSNC, (i.e. systolic processing chip)); 
Dasari and Ginosar are both directed to neural networks, particularly architecture. In view of the teachings of Ginosar, it would have been obvious to one of ordinary skill in the art to apply the teachings of Ginosar to Dasari before the effective filing date of the claimed invention in order to perform forward processing and back propagation learning simultaneously by not limiting the architecture to a single channel with dataflow in one direction thereby improving Dasari (cf. Ginosar Pg. (20) #, Col. # recites, in part, “A number of digital parallel architectures for neural network processing have been described in the literature... Some of them are limited to forward processing…, while others support learning… Some architectures employ systolic data flow… As is well known, "Systolic" processing means: In systolic computation, dataflaws rhythmically from one processing unit to another each clock cycle. The multiple identical or similar processing units operate on the data simultaneously, each unit processing a different part of the data at the same time… All those architectures employ a single channel for data sharing among the neurons and across the layers, which limits processing to only one input datum at each cycle. The use of a single channel is also twice as slow learning as forward processing since data can flow in the network in only one direction. Image processing handles large amounts of data, and if real-time speed is required (30 images per second), then none of the above architectures provide a suitable solution when implemented e.g. with hitherto known technology”).

Regarding claim 21, 
 the computer-implemented method of Claim 20, further comprising: 
determining that each of the number of layers of the neural network has been processed by the systolic processing chip (Ginosar (11) recites “As is well known, back propagation learning on an data example (several data packets) occurs only after the forward processing of that example has already been completed throughout the network.” Additionally, Ginosar (23) recites “These channels can also be used in a known per se manner by the host to monitor the learning progress of the network.” Examiner interprets the completion of forward processing of a neural network as each layer of the neural network having been processed. The monitoring of the learning progress is interpreted as determining neural network layer processing.); and 
storing an additional output of the systolic processing chip in the memory as an output of the neural network (Ginosar (12) recites “To ensure the availability of these inputs… e.g. a 275 9-bit entry FIFO buffer being an exemplary inter-layer structure is incorporated in the chip... The FIFO buffer is depicted as "past inputs" block 25 in the DSNC of FIG. 2. At any given time this FIFO stores the last 11 data examples (consisting each of 25 data packets) that have entered the chip, and presents them (over the Past Input Channel) at the proper time to the backward stages of the neurons. The last 11 data examples are fed at the correct timing to the appropriate constituent of the back propagation section by means of hard-wired registers... the data example that is fed to … together with the error of the same processed data example after having been processed...” Examiner interprets the errors as additional outputs and its storage as part of the past input values stored in FIFO buffer as storing in memory).
Please see motivation for claim 20 above.

Regarding claim 22, 
 the computer-implemented method of Claim 20, further comprising: 
for a first set of back propagations, assigning the first arrangement to back propagate first received values according to the different layer of the neural network (Ginosar (22) recites “When programming the chip for execution, the output function and its derivative are fed into the chip through this channel; when processing an image, the outputs of a whole bank (five neurons) are presented on the channel via the output LUTs 26.sup.(1) . . . 26.sup.(5) (of which only 26.sup.(1) and 26.sup.(5) are shown in FIG. 4). The Output Learning Channel carries the errors of a whole bank towards the previous layer, for each input example. Thus, when referring to FIG. 5, the output of 34.sup.(2) carries the errors of the third layer 34 to the input of layer 32.sup.(2) in the second layer 32 and likewise, the output of 32.sup.(2) carries the errors of the second layer 32 to the first layer 30.” Examiner interprets errors carried back for each input example to a previous 2nd layer (i.e. a different layer) from the 3rd layer as a first set of back propagations); and 
for a second set of back propagations, assigning the first arrangement to back propagate second received values according to the first layer of the neural network (Ginosar (22) recites “Thus, when referring to FIG. 5, the output of 34.sup.(2) carries the errors of the third layer 34 to the input of layer 32.sup.(2) in the second layer 32 and likewise, the output of 32.sup.(2) carries the errors of the second layer 32 to the first layer 30.” Examiner interprets errors carried back for each input example to a previous 1st layer from the 2nd layer as a second set of back propagations).
Please see motivation for claim 20 above.

Regarding claims 23-25,
 comprising a controller configured to perform methods substantially identical to those recited in claims 20-22. Therefore, the rejections to claims 20-22 apply equally here.
In addition, Dasari discloses the additional limitation of device and controller (Dasari fig. 1A and [0032] recites “FIG. 1A illustrates a system 100 for processing computational tasks of neural networks, according to an illustrative implementation. The system 100 includes a main processing unit 101 and an artificial intelligence processing unit (AIPU) 102. The system 100 is housed within a host computing device (not shown).”).



Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Bruestle et al. (U.S. Patent No. 10817802) teaches architecture and techniques of an apparatus for hardware accelerated machine learning, Tensor Processing Unit (TPU).
Gunter et al. (U.S. Patent No. 10790828) teaches an Application Specific Integrated Circuit (ASIC) chip with systolic array of cells designed for operations of machine learning models.
Ross et al. (U.S. Patent No. 10521488) teaches dynamic partitioning of matrix multiplication units, systolic arrays and convolutional neural networks.
Vantrease et al. (US 20190236049 A1) teaches processing elements of a systolic array to perform operations of a neural network for parallel processing.
Wu et al. (US 20190114548 A1) teaches scheduling a massively parallel programmable hardware system with levels forming a systolic array.
Meyer et al. (US 20190042918 A1) teaches reconfigurable fabric hardware for machine learning.
Phelps et al. (US 20180336164 A1) teaches performing neural network computation in hardware and a matrix multiply unit as a systolic array.
Zhang et al. (US 20180314671 A1) teaches systolic array design and implementation on reconfigurable processing platforms such as a Field-Programmable Gate Array (FPGA).
Barik et al. (US 20180307980 A1) teaches data processing with a Graphics Processing Unit (GPU) for machine learning and hardware acceleration.
Huang et al. (US 20180307438 A1) teaches implementing a systolic array matrix multiplier for matrix multiply operations.
Young et al. (U.S. Patent No. 10083395) teaches computing neural network inferences with a special-purpose hardware circuit.
Thorson et al. (U.S. Patent No. 10074051) teaches a circuit for performing neural network computations for a neural network and systolic array.
Woo (U.S. Patent No. 10019668) teaches memory management processes for performing neural network computations related to batch processing and scheduling.
Boesch et al. (US 20180189642 A1) teaches deep convolutional neural networks (DCNN) and hardware accelerator engine arranged to implement a portion of the DCNN.
Bittner et al. (US 20180157465 A1) teaches block floating-point (BFP) implementations, including use of BFP implementations in artificial neural networks (NNs).
Nowatzyk et al. (U.S. Patent No. 9928460) teaches accelerating neural network computations in hardware, tiles, and stacking.
Gokmen (US 20180075350A1) teaches configurations of trainable resistive crosspoint devices referred to as resistive processing units (RPUs) relating to NNs formed from crossbar arrays of two-terminal RPUs with local data storage and local data processing for accelerating NN.
Ross et al. (U.S. Patent No. 9697463) teaches computing convolutions using a NN processor and systolic array.
Ross et al. (US 20170103313 A1) teaches a special-purpose hardware circuit that computes neural network inferences.
Ross et al. (US 20170103314 A1) teaches computing neural network inferences in hardware, prefetching weights, and coordinating loading weight inputs to a systolic array.
Lupon et al. (US 20150170021 A1) teaches a modular, reconfigurable, and variable-precision processing unit for CNNs. 

Savich (US 20140289445 A1) teaches a systolic compute accelerator architecture for matrix operations and an application specific engine.
Chakradhar (US 20110029471 A1) teaches CNNs and configuring a coprocessor to address accelerating CNNs.
Le et al. (WO-2019075267-A1) teaches a neural network system and processing inputs through the layers of neural networks to generate outputs with an activation layer.
Young et al. (EP-3373210-A1) teaches transposing neural network matrices in hardware and a special-purpose hardware circuit that computes neural network inferences.
Gokmen et al. ("Training Deep Convolutional Neural Networks with Resistive Cross-Point Devices", 2017) teaches arrays of RPU devices forming deep NNs for parallelization and programmability, also incorporating backpropagation to the hardware.
Jones et al. ("Learning in Linear Systolic Neural Network Engines: Analysis and Implementation", 1994) teaches processing elements of a systolic array and digital architectures for NNs.
Smith ("Decoupled Access/Execute Computer Architectures", 1982) teaches an approach of architecture using a separation of access to memory and operation execution. 
Jouppi et al. ("In-Datacenter Performance Analysis of a Tensor Processing Unit", 2017 JUN 24-28) teaches TPU architecture, implementation and software.
Chen et al. ("Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks", 2016) teaches a row-stationary (RS) dataflow and spatial architecture that can adapt to different CNN shape configurations utilizing the processing engine local storage, direct inter-PE communication and spatial parallelism.




Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571) 270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/LWC/Examiner, Art Unit 2124                                                                                                                                                                                                        
/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126