DETAILED ACTION

Response to Arguments
Applicant’s arguments, filed 06/28/2022, with respect to 35 U.S.C. § 112(f), 35 U.S.C. § l 12(a), and 35 U.S.C. § l 12(b) have been fully considered and are persuasive. They have been withdrawn. 
Furthermore, Applicant’s arguments with respect to claims 1, 13 and 25 with respect to 35 U.S.C. § 103 have been considered, but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/07/2022 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-9, 18-21 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Yin et al. (“A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications”) (herein thereafter Yin) in view of Lin et al. "The architectural implications of autonomous driving: Constraints and acceleration." Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems. March 24-28, 2018 (“Lin”).

Regarding Claim 1:
	Yin teaches: 
“An integrated circuit for performing computations for a neural network comprising a plurality of neural network layers, the integrated circuit comprising:” (Yin teaches a hybrid neural network processor which includes neural network layers. [Yin page 968 sec. I. ¶2: “This paper presents a hybrid-NN processor (called “Thinker”) [20], which is designed to address the three special features of hybrid-NNs.”, page 969 sec. III. A. ¶2: “Fig. 2 shows the top-level architecture of the proposed hybrid-NN processor.”, Fig. 2, Abstract: “First, each processing element (PE) supports bit-width adaptive computing to meet various bit-widths of neural layers”]). 
“a signal processor configured to provide programming data used to perform the computations;” (Yin teaches a CPU that provides data signals (referred by Yin as ‘input data’) and programming data (referred by Yin as ‘configuration words’). [Yin page 976 sec IV. D. ¶2: “Thinker processor usually works as a coprocessor of primary CPU. Before the execution of a hybrid-NN, CPU loads the array and layer parameters into the Thinker’s parameter buffer, and PE configuration words into configuration buffer. During runtime, CPU sends the input data and weights to Thinker’s data buffers.”]). 
an activation memory configured to store sets of layer inputs;” (Yin teaches an activation memory (referred to as the data buffers by Yin) storing input data to the arrays which are used for computations for the neural network layers. [Yin Page 970-971 sec. III A. ¶3: “2) On-Chip Memory System: Two 144-KB multi-bank SRAM data buffers (Data Buffer_1 and Data Buffer_2) store intermediate data between NN layers […] During the computation of a specific layer, one sub-buffer provides input data to the arrays and the other stores the output data from the arrays.”, page 973 sec. III. B. ¶7: “Thinker’s multi-bank on-chip buffers, distributed IO ports, and independent computation flows of different operations enable the PE array to be flexibly partitioned into four sub-arrays to process CONV, FC, pooling, and RNN-gating operations, respectively. Fig. 8(a) illustrates processing LRCN in AP-flow. CONV array is assigned with 15 × 13 general PEs; 15 × 3 general PEs are assigned for FC array.”, Table I: 
    PNG
    media_image1.png
    376
    246
    media_image1.png
    Greyscale
]). 
“a parameter memory configured to store parameters for a first neural network layer;” (Yin teaches a weight buffer for storing weights for the neural network layers. Weight buffer is the parameter memory. [Yin page 971 sec. III. A: “The 1-KB weight buffer is used to store the weights loaded from external memory and provides weights to PE arrays.”, page 973 sec. III. B. ¶7: “Thinker’s multi-bank on-chip buffers, distributed IO ports, and independent computation flows of different operations enable the PE array to be flexibly partitioned into four sub-arrays to process CONV, FC, pooling, and RNN-gating operations, respectively. Fig. 8(a) illustrates processing LRCN in AP-flow. CONV array is assigned with 15 × 13 general PEs; 15 × 3 general PEs are assigned for FC array.”, Table I: 

    PNG
    media_image1.png
    376
    246
    media_image1.png
    Greyscale
]). 
 “a computation unit having multiple computing cells, at least one computing cell of the multiple computing cells being configured to:” (Yin teaches PE arrays that are the main computing units. As seen in Fig. 2, the PE arrays are comprised of PEs. [Yin page 970 sec. III. A. ¶1: “Fig. 2 shows the top-level architecture of the proposed hybrid-NN processor. Two 16×16 heterogeneous PE arrays are the main computing units. Each array can be partitioned into sub-arrays for different functions, and PEs can be configured to execute bit-width adaptive operations.”, Fig. 2: 

    PNG
    media_image2.png
    430
    601
    media_image2.png
    Greyscale
]).
“i) receive, for the first neural network layer, an input of the sets of layer inputs accessed by the rotation circuit,” (Yin teaches the computing cells (i.e. the PEs) receiving input feature points that were accessed in turn by the rotation unit. [Yin page 970 sec. III. A. ¶2: “The input feature points are loaded to PEs at the left/right edge, and are horizontally shifted to the PEs inside the array”, page 970-971 sec. III. A. ¶3: “During the computation of a specific layer, one sub-buffer provides input data to the arrays”, page 970 sec III. A: “2) On-Chip Memory System: Two 144-KB multi-bank SRAM data buffers (Data Buffer_1 and Data Buffer_2) store intermediate data between NN layers”]). 
“ii) receive a parameter for the first neural network layer,” (Yin teaches the PEs receiving weights which are parameters and the PEs are the computing cells for the neural network layers. [Page 971 sec. III. A. ¶4: “The 1-KB weight buffer is used to store the weights loaded from external memory and provides weights to PE arrays. In each PE array, a 16-KB local buffer exploits weight reuse, which provides 16 weights for 16 PE columns in parallel.”, page 973 sec. III. B. ¶7: “Thinker’s multi-bank on-chip buffers, distributed IO ports, and independent computation flows of different operations enable the PE array to be flexibly partitioned into four sub-arrays to process CONV, FC, pooling, and RNN-gating operations, respectively. Fig. 8(a) illustrates processing LRCN in AP-flow. CONV array is assigned with 15 × 13 general PEs; 15 × 3 general PEs are assigned for FC array.”, Table I: 
    PNG
    media_image1.png
    376
    246
    media_image1.png
    Greyscale
]).
“and iii) generate at least a portion of an output of the first neural network layer using the input and the parameter;” (As can be seen in Fig. 6(b), Yin teaches an output for a neural network layer generated based on weights (i.e. parameters) and input data. [page 973 sec. III. B. ¶7: “Thinker’s multi-bank on-chip buffers, distributed IO ports, and independent computation flows of different operations enable the PE array to be flexibly partitioned into four sub-arrays to process CONV, FC, pooling, and RNN-gating operations, respectively. Fig. 8(a) illustrates processing LRCN in AP-flow. CONV array is assigned with 15 × 13 general PEs; 15 × 3 general PEs are assigned for FC array.”, page 972 sec. III. B. ¶1: “In PE array, the computation of output points are fixed on the respective PEs, reusing weights, and input feature points in the vertical and horizontal directions, respectively (also known as output stationary dataflow).”, Fig. 6(b): 
    PNG
    media_image3.png
    380
    597
    media_image3.png
    Greyscale
]). 
“and a crossbar circuit configured to store the output of the first neural network layer, in the activation memory, in accordance with a bank assignment pattern that is based on the programming data.” (Yin teaches the output of a layer being stored onto a data buffer (i.e. the activation memory) in accordance to a rule (i.e. a bank assignment pattern) based on the programming data and attribute values wherein the attribute values taught by Yin are stride values. [Yin page 970-971 sec. III. A. ¶3: “In order to support array partitioning (AP), each data buffer is divided into two sub-buffers for CONV and FC sub-arrays. During the computation of a specific layer, one sub-buffer provides input data to the arrays and the other stores the output data from the arrays.”, page 975 sec IV. B. ¶5: “Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”]). 
Yin does not teach “ a core comprising a plurality of units implemented in hardware, the core being  configured for data communication with the signal processor  to receive the programming data provided by the signal processor; a hardware rotation circuit  configured to, over one or more clock cycles, rotate layer inputs  accessed at the activation memory, wherein the layer inputs are rotated along a dimension of a tensor by shifting a respective position of each layer input corresponding to an element of the tensor based on control signals received at the rotation circuit”
However, Lin teaches:
“a core comprising a plurality of units implemented in hardware, the core being  configured for data communication with the signal processor  to receive the programming data provided by the signal processor” (Lin,  pg. 759, right column,  see also fig. 8, “An overview of our design is presented in Figure 8. The memory controller first initiates the data transfer between the host device and the FPGA accelerator, and the layer definition (i.e., layer type, weights) is fed to the header decoder unit (HdrDc_unit) to configure the layer. Each buffer stores the corresponding neural network weights, input values, and internal temporary variables until the execution of the layer has been completed. Each PE, primarily consists of multiply-accumulate (MAC) units instantiated by the digital processing processors (DSPs) on the fabric, then performs the necessary computation on the data stored in the WeightBufer and InputBufer and writes the output to the OutputBufer.”). 
“a hardware rotation circuit  configured to, over one or more clock cycles, rotate layer inputs  accessed at the activation memory, wherein the layer inputs are rotated along a dimension of a tensor by shifting a respective position of each layer input corresponding to an element of the tensor based on control signals received at the rotation circuit” (Lin, pgs. 759-760, right column,  see also fig. 9, “For oFAST, we implement an image buffer (ImgBufer) and a feature point buffer (FtrPntBufer) using shift register, and the mask window is assigned to the corresponding register.As the input data streaming into the buffer, they are filtered by the mask window so the feature detector only receive the data it needs… [f]or rBRIEF, we store the pattern information in the pattern LUT on-chip. Rotate_unit is implemented to rotate to the corresponding coordinates… [b]y synthesizing on real systems, we demonstrate our FE implementation can execute at a frequency of 250MHz.” Figure 9 (as detailed herein) details the Rotate_unit(s) used in the FPGA: 

    PNG
    media_image4.png
    475
    514
    media_image4.png
    Greyscale
). 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Yin in view of Lin the motivation to do so would be to implement algorithms using an accelerated framework to reduce latency time for autonomous vehicles (Lin, pg. 752, “For instance, we discover it is critical for the system to be able
to finish the end-to-end processing at a latency less than 100 ms and a frame rate higher than 10 frames per second, to react fast enough to the constantly changing traffic conditions…we find GPU-, FPGA-, ASIC-accelerated autonomous driving systems can significantly reduce the processing tail latency by factors of 169×, 10×, 93× respectively.”).

Regarding Claim 2:
Yin in view of Lin teaches “The circuit of claim 1,”  as seen above. 
Yin further teaches:
“wherein the rotation circuit is further configured to rotate accessing elements of an input tensor,” (Examiner notes that “rotate elements” is not well defined. It is interpreted as shifting elements as per page 17 lines 21-23 of the instant specification. It can be seen in figure 10(e) that Yin teaches shifting the elements of input tensor. I is the input tensor; the elements of the input tensor are rotated/shifted to provide inputs. [Yin page 969 sec. II: “The input data include vectors and weighting matrixes, while the output data are vectors.”, page 975 sec. IV. B: “For a kernel of size K, since the output stationary dataflow is adopted, each row of PE array requires a set of K × K input feature points to calculate one output feature point. When different rows of PEs execute concurrently, input points for adjacent rows are overlapped.”, page 975 sec. IV. B: “In CONV layers, each input feature point can be represented as p = I[r][c][chin], where 0 ≤ r ≤ R−1, 0 ≤ c ≤ C−1, and 0 ≤ chin ≤ Chin. In order to enable the parallel access of fused data pattern  P, each point in P needs to be mapped to a unique bank.”, Fig. 10e: 
    PNG
    media_image5.png
    396
    601
    media_image5.png
    Greyscale
]). 
“where each element of the input tensor corresponds to a respective input of a set of inputs stored in the activation memory.” (It can be seen in figure 10(e) that Yin teaches each element of the input tensor corresponds to inputs stored in CONV buffer, which is part of the data buffers (i.e. the activation memory).  [Yin page 974 sec. IV. B ¶1: “Since ConvNet and FCNet/RNN have quite different data access patterns, in order to ensure the parallel data access, the multiple memory banks are partitioned to two groups, CONV buffer and FC buffer, data for CONV array and FC array with corresponding access pattern, respectively.”, Fig. 10e: 

    PNG
    media_image6.png
    396
    601
    media_image6.png
    Greyscale
]). 

Regarding Claim 3:
Yin in view of Lin teach “The circuit of claim 2” as seen above.  
	Yin further teaches:
“wherein the rotation circuit is further configured to: rotate accessing elements of the input tensor along a first dimension of the input tensor based on a first rotation factor;” (Examiner notes that rotate elements is interpreted as shifting elements as per page 17 lines 21-25 of the instant specifications. Yin teaches shifting the input data over the feature map along the y dimension, as seen in Fig 10(c). [Yin Page 975 sec. IV. B ¶: “An example is given in Fig. 10(b)–(d), where a 3×3 (K = 3) kernel-based convolution is  performed with stride S of 1 in a 16-row CONV array.”, Fig. 10(b)-(d): 
    PNG
    media_image7.png
    541
    598
    media_image7.png
    Greyscale
 ]).  
“rotate accessing elements of the input tensor along a different second dimension of the input tensor based on a second rotation factor that is different than the first rotation factor;” (Yin teaches shifting the input data over the feature map along the x dimension, seen in Fig. 10(d). [Yin Fig. 10(b)-(d): 
    PNG
    media_image8.png
    541
    598
    media_image8.png
    Greyscale
]). 
“and provide an input that corresponds to a rotated element of the input tensor to a computing cell of the computation unit.” (Yin teaches providing input to computing cells (i.e. PEs) of the computation unit (i.e. the PE array). [Yin page 975 sec. IV. B. ¶6: “Each input feature point for K × K filter is sequentially input into the corresponding row of PE array.”]). 

Regarding Claim 4:
Yin in view of Lin teach “The circuit of claim 1” as seen above. 
Yin further teaches:  
“wherein the crossbar circuit is further configured to: determine a mapping of activations in the output in response to processing the bank assignment pattern,” (Yin teaches mapping the output of each CONV layer in accordance to the bank assignment pattern. [Yin page 975 sec. IV. B: “In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”]). 
“where the mapping identifies memory banks of the activation memory for storing the activations for the second neural network layer based on an attribute value assigned to the second neural network layer.” (Yin teaches that the mapping involving the memory banks of the activation memory and bases the rule off of the attribute value (i.e. stride value; S in equation 1). [Yin page 975 sec IV. B. ¶5: “Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”, page 975 sec IV. B. ¶4 equation 1: 
    PNG
    media_image9.png
    76
    470
    media_image9.png
    Greyscale
]). 

Regarding Claim 5:
Yin in view of Lin teach “The circuit of claim 4” as seen above. 
Yin further teaches:  
“wherein the crossbar circuit is further configured to: cause data for the output of the first neural network layer to be stored at particular address locations of the activation memory,” (Yin teaches the output data being stored at particular locations in the CONV sub buffers of the data buffer (i.e. the activation memory) using a specific rule. [Yin page 970-971 sec. III. A. ¶3: “In order to support array partitioning (AP), each data buffer is divided into two sub-buffers for CONV and FC sub-arrays. During the computation of a specific layer, one sub-buffer provides input data to the arrays and the other stores the output data from the arrays. For next layer, two buffers exchange their functions. There are 48 banks in each buffer”, Fig. 10(a): 
    PNG
    media_image10.png
    561
    598
    media_image10.png
    Greyscale
, page 975 sec IV. B. ¶5: “Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”]).
“the data for the output being assigned to an address location of the activation memory based on a configurable mapping that changes for different respective layers of the neural network.” (Yin teaches the output map of the CONV layer being stored on a bank on the data buffer with the location dependent on a rule. After each layer, the two sub buffers switch functions and the mapping changes. [Yin page 975 sec. IV. B ¶5: “In CONV layers, each input feature point can be represented as p = I[r][c][chin], where 0 ≤ r ≤ R−1, 0 ≤ c ≤ C−1, and 0 ≤ chin ≤ Chin. In order to enable the parallel access of fused data pattern P, each point in P needs to be mapped to a unique bank. Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”, Page 970-971 sec. III. A. ¶3: “In order to support array partitioning (AP), each data buffer is divided into two sub-buffers for CONV and FC sub-arrays. During the computation of a specific layer, one sub-buffer provides input data to the arrays and the other stores the output data from the arrays. For next layer, two buffers exchange their functions.”]). 

Regarding Claim 6:
Yin in view of Lin teach “The circuit of claim 4” as seen above. 
Yin further teaches:  
“wherein: the rotation circuit is further configured to access output data for the output of the first neural network layer as layer inputs to the second neural network layer for processing at the second neural network layer;” (Yin teaches accessing the output data and the output data being used as input data for the next layer. [Yin page 975 sec. IV. B ¶5: “In CONV layers, each input feature point can be represented as p = I[r][c][chin], where 0 ≤ r ≤ R−1, 0 ≤ c ≤ C−1, and 0 ≤ chin ≤ Chin. In order to enable the parallel access of fused data pattern P, each point in P needs to be mapped to a unique bank. Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”]). 
“and the determined mapping is configured such that a bank conflict does not occur at the memory banks of the activation memory when the rotation circuit accesses layer inputs for the second neural network layer that correspond to the output of the first neural network layer.” (Yin teaches using a specific rule to determine the bank location for storing data. To further support no bank conflict during the accessing of input data, the two sub buffers switch functions of sending input data and storing output data. [Yin page 975 sec. IV. B ¶5: “In CONV layers, each input feature point can be represented as p = I[r][c][chin], where 0 ≤ r ≤ R−1, 0 ≤ c ≤ C−1, and 0 ≤ chin ≤ Chin. In order to enable the parallel access of fused data pattern P, each point in P needs to be mapped to a unique bank. Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”, page 970-971 sec. III. A. ¶3: “In order to support array partitioning (AP), each data buffer is divided into two sub-buffers for CONV and FC sub-arrays. During the computation of a specific layer, one sub-buffer provides input data to the arrays and the other stores the output data from the arrays. For next layer, two buffers exchange their functions.”]). 

Regarding Claim 7:
Yin in view of Lin teach “The circuit of claim 4” as seen above. 
Yin further teaches:  
“wherein the attribute value assigned to the second neural network layer is: a stride value for the second neural network layer, or a skip value for the second neural network layer.” (Yin teaches a stride value and this attribute value is used for storage location purposes.  [Yin page 975 sec. IV. B ¶5: “In CONV layers, each input feature point can be represented as p = I[r][c][chin], where 0 ≤ r ≤ R−1, 0 ≤ c ≤ C−1, and 0 ≤ chin ≤ Chin. In order to enable the parallel access of fused data pattern P, each point in P needs to be mapped to a unique bank. Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”, page 975 sec. IV. B equation 1: 
    PNG
    media_image11.png
    73
    335
    media_image11.png
    Greyscale
, Table II: 
    PNG
    media_image12.png
    229
    602
    media_image12.png
    Greyscale
]). 

Regarding Claim 8:
Yin in view of Lin teach “The circuit of claim 1” as seen above. 
Yin further teaches:  
“wherein the core is configured to: use the rotation circuit to access layer inputs stored in a first set of memory banks of the activation memory without the occurrence of a bank conflict;” (Yin teaches the CPU (which has at least one core) accessing input data stored in the activation memory depending on a specific rule which would prevent bank conflict. [Yin page 976 sec. IV. D. ¶2: “Thinker processor usually works as a coprocessor of primary CPU.”, page 975 sec. IV. B ¶5: “In order to enable the parallel access of fused data pattern P, each point in P needs to be mapped to a unique bank. Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”]). 
“and use the crossbar circuit to store layer outputs in a second set of memory banks of the activation memory without the occurrence of a bank conflict.” (Yin teaches the storing of outputs into memory banks of the data buffers (i.e. the activation memory) based on specific rules which would prevent bank conflict. [Yin page 975 sec. IV. B ¶5: “In order to enable the parallel access of fused data pattern P, each point in P needs to be mapped to a unique bank. Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”]). 

Regarding Claim 9:
Yin in view of Lin teach “The circuit of claim 7” as seen above. 
Yin further teaches:  
“wherein the core is configured to: synchronize rotation based data access operations of the rotation circuit with pattern based data storage operations of the crossbar circuit to achieve a utilization rate of the computation unit that exceeds a threshold utilization rate.” (Yin teaches the accessing of data in the activation memory being assigned to make the maximum use of resources. Data access is done in parallel which is equivalent to synchronized data access. Examiner notes that the threshold is not defined in the claims. Threshold is interpreted to be any utilization rate from a different system. The method proposed by Yin is called AP-flow and the array utilization rates that result from using this method are shown in comparison to TM-flow, a method used in existing neural network processors. It can be seen in Table IV that at least one AP-flow utilization rate exceeds the TM-flow utilization rate which are interpreted to be the thresholds. [Yin Page 974 sec. IV. B. ¶1: “Input feature points for both CONV array and FC array need to be loaded from multi-bank on-chip buffer in parallel. Since ConvNet and FCNet/RNN have quite different data access patterns, in order to ensure the parallel data access, the multiple memory banks are partitioned to two groups, CONV buffer and FC buffer, which provide input data for CONV array and FC array with corresponding access pattern, respectively.”, page 973 sec. IV. B. ¶7: “Therefore, an AP-flow is proposed in this paper, in which majority of PEs are allocated to computation-intensive ConvNet and most of the bandwidth are allocated to memory-intensive FCNet/RNN. Thinker’s multi-bank on-chip buffers, distributed IO ports, and independent computation flows of different operations enable the PE array to be flexibly partitioned into four sub-arrays to process CONV, FC, pooling, and RNN-gating operations, respectively.” page 969 sec. I. ¶4: “the time-multiplexing (TM)-based computing flow, which is adopted in the existing NN processors, is inefficient for processing hybrid-NNs.”, page 977 sec. V. A. ¶3: “As shown in Table IV, we can see that AP-flow achieves better array utilization than TM-flow for all benchmarks except Yolo.”,  page 977 sec. V. Table IV: 

    PNG
    media_image13.png
    364
    1195
    media_image13.png
    Greyscale
]).
Regarding Claim 17:
Yin in view of Lin teaches “The method of claim 16,” as seen above. 
Yin further teaches:
“further comprising: assigning, using the crossbar circuit, data for the output of the first neural network layer to an address location of the activation memory based on a configurable mapping that changes for different respective layers of the neural network;” (Yin teaches the output map of the CONV layer being assigned to be stored at a bank location on the data buffer with the location dependent on a rule. After each layer, the two sub buffers switch functions and the mapping changes. [Yin page 975 sec. IV. B ¶5: “In CONV layers, each input feature point can be represented as p = I[r][c][chin], where 0 ≤ r ≤ R−1, 0 ≤ c ≤ C−1, and 0 ≤ chin ≤ Chin. In order to enable the parallel access of fused data pattern P, each point in P needs to be mapped to a unique bank. Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”, Page 970-971 sec. III. A. ¶3: “In order to support array partitioning (AP), each data buffer is divided into two sub-buffers for CONV and FC sub-arrays. During the computation of a specific layer, one sub-buffer provides input data to the arrays and the other stores the output data from the arrays. For next layer, two buffers exchange their functions.”]). 
“and storing, using the crossbar circuit, the data for the output of the first neural network layer at particular assigned address locations of the activation memory based on the configurable mapping for the second neural network layer.” (Yin teaches the output data being stored at particular locations in the CONV sub buffers of the data buffer (i.e. the activation memory) using a specific rule. [Yin page 970-971 sec. III. A. ¶3: “In order to support array partitioning (AP), each data buffer is divided into two sub-buffers for CONV and FC sub-arrays. During the computation of a specific layer, one sub-buffer provides input data to the arrays and the other stores the output data from the arrays. For next layer, two buffers exchange their functions. There are 48 banks in each buffer”, Fig. 10(a): 
    PNG
    media_image10.png
    561
    598
    media_image10.png
    Greyscale
, page 975 sec IV. B. ¶5: “Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”]).
Referring to dependent claims 18-21 they are rejected on the same basis as dependent claims 6-9 since they are analogous claims.
Referring to independent claim 25 it is are rejected on the same basis as independent claim 1 since they are analogous claims.


Claims 10-12 and 22-24 are rejected under 35 U.S.C. 103 as being unpatentable over Yin et al. (“A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications”) (herein thereafter Yin) in view of Lin et al. "The architectural implications of autonomous driving: Constraints and acceleration." Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems. March 24-28, 2018 (“Lin”) and in view of Young et al. (US9721203) (“Young”).

Regarding Claim 10:
Yin in view of Lin teach “The circuit of claim 1” as seen above. 
Yin in view of Lin do not teach: “wherein the signal processor is configured to: receive, from an external controller, an instruction comprising data values to be used at the core;” “and provide at least the data values of the instruction to the core for storing at a component of the core.”
However, Young teaches:  
“wherein the signal processor is configured to: receive, from an external controller, an instruction comprising data values to be used at the core;” (Examiner notes that “external” for an external controller is not defined. External controller is interpreted to be a controller that is external to anything. Young teaches receiving control signals from an external processor; this external processor is considered the external controller. The control signals are received by the host interface which are a part of a neural network processing system. This system is implemented via a computer which is a processing device and comprises of at least one core. The control signals are instructions that comprise of data values such as stride value and are used by the neural network processing system. [Young col 7 lines 18-20: “In some other implementations, the host interface 302 passes in a control signal from an external processor.”, col 3 lines 28-29: “FIG. 3 shows an example neural network processing system.”, col 6 lines 58-60: “FIG. 3 shows an example special-purpose hardware circuit 300 for performing neural network computations. The system 300 includes a host interface 302.”, col 4 lines 50-53: “The neural network processing system 100 is an example of a system implemented as one or more computers in one or more locations in which the systems, components, and techniques described below can be implemented.”, col 10 lines 36-45: “That is, the control signals 610 can regulate whether the activation values are pooled, where the activation values are stored, e.g., in the unified buffer 308, or can otherwise regulate handling of the activation values. The control signals 610 can also specify the activation or pooling functions, as well as other parameters for processing the activation values or pooling values, e.g., a stride value.”, col 13 lines 25-26: “The matrix computation unit 312 performs computations based on the control signals”]). 
“and provide at least the data values of the instruction to the core for storing at a component of the core.” (Young teaches the host interface (which is a part of the processing device) providing instructions that include stride value to another component of the neural processing system (i.e. the sequencer) which is implemented via a computer comprising at least one core. [Young col 12 lines 51- 57: “For example, upon receiving instructions for implementing the neural network layer having a stride greater than one, the host interface 302 can send the instructions to the sequencer 306 of FIG. 3, and the sequencer 306 can convert the instructions into low level control signals that control the special-purpose hardware circuit 300 of FIG. 3 to perform the neural network computation.”]). 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Yin in view of Lin and in view of Young the motivation to do so would be to avoid processing delays, making neural network computations more efficient (Young col 3 lines 7-13: “This allows for an inference of a neural network that includes a convolutional layer having a stride greater than one to be determined efficiently without modifying the hardware architecture of the special purpose hardware circuit. That is, processing delays resulting from performing part of the processing off-chip, in software, or both, are avoided.”).

Regarding Claim 11:
Yin in view of Lin and in view of Young teach “The circuit of claim 10” as seen above. 
Young further teaches:  
“wherein the signal processor is a digital signal processor 20(DSP) configured to:” (The processing device of the neural network processing system is a computer which is a digital signal processor. [Young col 4 lines 50-51: “The neural network processing system 100 is an example of a system implemented as one or more computers”]).  
“process an instruction received from the external controller;” (Young teaches the neural network processing system (implemented through a DSP, as stated above) receiving control signals (i.e. instructions) from an external processor which is equivalent to an external controller. [Young col 7 lines 18-20: “In some other implementations, the host interface 302 passes in a control signal from an external processor.”, col 3 lines 28-29: “FIG. 3 shows an example neural network processing system.”, col 6 lines 58-60: “FIG. 3 shows an example special-purpose hardware circuit 300 for performing neural network computations. The system 300 includes a host interface 302.”]). 
“and in response to processing the instruction, configure one or more registers at the core using data values of the instruction.” (Young teaches registers storing control signals that control the neural network computations and are equivalent to the configuration data. The neural network computations are done in the matrix computation unit which is a part of the neural network processing system (implemented through a DSP, as stated above). The DSP comprises of at least one core. [Young col 7-8 lines 65-1: “FIG. 4 shows an example architecture 400 including a matrix computation unit. The matrix computation unit is a two-dimensional systolic array 406. The array 406 includes multiple cells 404.”, col 9-10 lines 60-1: “The control register can store a control signal that determines whether the cell should shift either the weight input or the activation input to adjacent cells. In some implementations, shifting the weight input or the activation input takes one or more clock cycles. The control signal can also determine whether the activation input or weight inputs are transferred to the multiplication circuitry 508, or can determine whether the multiplication circuitry 508 operates on the activation and weight inputs.”, col 4 lines 50-51: “The neural network processing system 100 is an example of a system implemented as one or more computers”]).  
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Yin in view of Lin and in view of Young for at least the same reasons as discussed above in claim 10 .

Regarding Claim 12:
Yin in view of Lin and in view of Young teach “The circuit of claim 11” as seen above. 
Young further teaches:  
“wherein the core is configured to access the one or more registers to obtain configuration data that defines the computations for the neural network,” (Young teaches registers storing control signals that control the neural network computations and are equivalent to the configuration data. The neural network computations are done in the matrix computation unit which is a part of the neural network processing system. This neural network processing system is implemented via a computer which comprises of at least one core. [Young col 7-8 lines 65-1: “FIG. 4 shows an example architecture 400 including a matrix computation unit. The matrix computation unit is a two-dimensional systolic array 406. The array 406 includes multiple cells 404.”, col 9-10 lines 60-1: “The control register can store a control signal that determines whether the cell should shift either the weight input or the activation input to adjacent cells. In some implementations, shifting the weight input or the activation input takes one or more clock cycles. The control signal can also determine whether the activation input or weight inputs are transferred to the multiplication circuitry 508, or can determine whether the multiplication circuitry 508 operates on the activation and weight inputs.”, col 4 lines 50-51: “The neural network processing system 100 is an example of a system implemented as one or more computers”]).  
“the computations being performed by the computation unit of the core based on data values derived from the instructions received from the external controller.” (Young teaches control signals being received from an external controller that the matrix computation unit of the neural network processing system uses to perform the computations. [Young col 7 lines 18-20: “In some other implementations, the host interface 302 passes in a control signal from an external processor.”, col 13 lines 25-26: “The matrix computation unit 312 performs computations based on the control signals”, col 10 lines 36-45: “That is, the control signals 610 can regulate whether the activation values are pooled, where the activation values are stored, e.g., in the unified buffer 308, or can otherwise regulate handling of the activation values. The control signals 610 can also specify the activation or pooling functions, as well as other parameters for processing the activation values or pooling values, e.g., a stride value.”]).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Yin in view of Lin and in view of Young for at least the same reasons as discussed above in claim 10.
Referring to dependent claims 22-24 they are rejected on the same basis as dependent claims 10-12 since they are analogous claims.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Adam Clark Standke whose telephone number is (571)270-1806. The examiner can normally be reached 10AM-7PM M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Adam Clark Standke
Assistant Examiner
Art Unit 2129



/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129