DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
The present application is being examined under the claims filed 12/21/2018. 
Claims 1-25 are pending. 

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 

(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or 
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are present in following limitations of claim 1: 
“a processing device configured to process data signals and provide programming data for performing the computations;” (Prong A is met with the usage of “device”. Prong B is met with the functional language, “configured to.” Prong C is met as the modifier “processing” is not sufficient structure. Looking to the page 7 line 3 of the instant specifications, “processing device” will be interpreted as a digital signal processor.)
“a rotation unit configured to rotate accessing the sets of layer inputs from the activation memory based on the programming data;” (Prong A is met with the usage of “unit”. Prong B is met with the functional language, “configured to.” Prong C is met as the modifier “rotation” is not sufficient structure.)
“a computation unit having multiple computing cells, at least one computing cell of the multiple computing cells being configured to:” (Prong A is met with the usage of “unit”. Prong B is met with the functional language, “having.” Prong C is met as the modifier “computation” is not sufficient structure.)
“a computation unit having multiple computing cells, at least one computing cell of the multiple computing cells being configured to:” (Prong A is met with the usage of “cells”. Prong B is met with the functional language, “being configured to.” Prong C is met as the modifier “computing” is not sufficient structure.)
“and a crossbar unit configured to cause the output of the first neural network layer to be stored, in the activation memory, in accordance with a bank assignment pattern that is based on the programming data and an attribute value assigned to a second neural network layer.” (Prong A is met with the usage of “unit”. Prong B is met with the functional language, “configured to.” Prong C is met as the modifier “crossbar” is not sufficient structure.)
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112(a)
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-12 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claims contain subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 

Regarding Claim 1:
The limitations of “rotation unit configured to”, “computation unit having”, “computing cells being configured to”, and “crossbar unit configured to” all invoked 112(f) and the specifications were further considered for interpretation of these elements. However, no corresponding algorithm was found for each of the elements. As per MPEP 2181(II)(B), computer implemented means-plus-function limitations require that the specification disclose a corresponding algorithm. Mere reference to a general purpose processor with appropriate programming without providing an explanation of the appropriate programming, or simply reciting "software" without providing detail about the means to accomplish a specific software function, would not be an adequate disclosure. In the disclosure, a rotation unit is described in the instant specifications as more of a black box that rotates and shifts elements in data and does not describe how this function is performed. Similarly, both “computation unit” and “computing cells” are described throughout the instant specifications as performing computations but no algorithm is disclosed. The “crossbar unit” is also described throughout the specifications as using instructions for bank assignment patterns to access input data and store output data but this is mere reference to appropriate programming and there is no detail about the means to accomplish this. 

Regarding dependent claims 2-12:
Claims 2-12 which depend from claim 1, do not resolve the written description issue in claim 1. Thus, claims 2-12 are also rejected under 112(a) by virtue of their dependence on claim 1.

Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-12 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The following limitations for Claim 1 
Claim limitation “a rotation unit configured to rotate accessing the sets of layer inputs from the activation memory based on the programming data;” invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. In the disclosure, a rotation unit is described as more of a black box that rotates and shifts elements in data and does not describe how this function is performed.
Claim limitation “a computation unit having multiple computing cells, at least one computing cell of the multiple computing cells being configured to:” invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. “Computation unit” is described throughout the instant specifications as performing computations but no algorithm is disclosed.
Claim limitation “a computation unit having multiple computing cells, at least one computing cell of the multiple computing cells being configured to:” invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. “Computing cells” are described throughout the instant specifications as performing computations but no algorithm is disclosed.
Claim limitation “and a crossbar unit configured to cause the output of the first neural network layer to be stored, in the activation memory, in accordance with a bank assignment pattern that is based on the programming data and an attribute value assigned to a second neural network layer.” invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. The “crossbar unit” is described throughout the specifications as using instructions for bank assignment patterns to access input data and store output data but this is mere reference to appropriate programming and there is no detail about the algorithm to accomplish this.
Therefore, claim 1 is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Claims 2-12 which depend from claim 1, do not resolve the written description issue in claim 1. Thus, claims 2-12 are also rejected under 112(a) by virtue of their dependence on claim 1.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 

(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-25 are rejected under 35 U.S.C. 103 as being unpatentable over Yin et al. (“A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications”) (herein thereafter Yin) in view of Young et al. (US9721203) (herein thereafter Young). 

Regarding Claim 1:
	Yin teaches a hybrid neural network processor (called “Thinker”) that has configurable processing elements, processing element arrays, and memory banking. Yin teaches: 
“A circuit for performing computations for a neural network comprising a plurality of neural network layers, the circuit comprising:” (Yin teaches a hybrid neural network processor which includes neural network layers. [Yin page 968 sec. I. ¶2: “This paper presents a hybrid-NN processor (called “Thinker”) [20], which is designed to address the three special features of hybrid-NNs.”, page 969 sec. III. A. ¶2: “Fig. 2 shows the top-level architecture of the proposed hybrid-NN processor.”, Fig. 2, Abstract: “First, each processing element (PE) supports bit-width adaptive computing to meet various bit-widths of neural layers”]). 
“a processing device configured to process data signals and provide programming data for performing the computations;” (Yin teaches a CPU that provides data signals (referred by Yin as ‘input data’) and programming data (referred by Yin as ‘configuration words’). [Yin page 976 sec IV. D. ¶2: “Thinker processor usually works as a coprocessor of primary CPU. Before the execution of a hybrid-NN, CPU loads the array and layer parameters into the Thinker’s parameter buffer, and PE configuration words into configuration buffer. During runtime, CPU sends the input data and weights to Thinker’s data buffers.”]). 
“and a core in data communication with the processing device to receive the programming data provided by the processing device, wherein the core comprises: (Yin teaches the Thinker processor receiving configuration words (i.e. programming data) from the CPU. The Thinker would comprise of at least a core due to it being a processor. The Thinker is in communication with the CPU due to it being a coprocessor of the CPU. [Yin page 976 sec IV. D. ¶2: “Thinker processor usually works as a coprocessor of primary CPU. Before the execution of a hybrid-NN, CPU loads the array and layer parameters into the Thinker’s parameter buffer, and PE configuration words into configuration buffer. During runtime, CPU sends the input data and weights to Thinker’s data buffers.”]).
an activation memory configured to store sets of layer inputs;” (Yin teaches an activation memory (referred to as the data buffers by Yin) storing input data to the arrays which are used for computations for the neural network layers. [Yin Page 970-971 sec. III A. ¶3: “2) On-Chip Memory System: Two 144-KB multi-bank SRAM data buffers (Data Buffer_1 and Data Buffer_2) store intermediate data between NN layers […] During the computation of a specific layer, one sub-buffer provides input data to the arrays and the other stores the output data from the arrays.”, page 973 sec. III. B. ¶7: “Thinker’s multi-bank on-chip buffers, distributed IO ports, and independent computation flows of different operations enable the PE array to be flexibly partitioned into four sub-arrays to process CONV, FC, pooling, and RNN-gating operations, respectively. Fig. 8(a) illustrates processing LRCN in AP-flow. CONV array is assigned with 15 × 13 general PEs; 15 × 3 general PEs are assigned for FC array.”, Table I: 
    PNG
    media_image1.png
    376
    246
    media_image1.png
    Greyscale
]). 
“a parameter memory configured to store parameters for a first neural network layer;” (Yin teaches a weight buffer for storing weights for the neural network layers. Weight buffer is the parameter memory. [Yin page 971 sec. III. A: “The 1-KB weight buffer is used to store the weights loaded from external memory and provides weights to PE arrays.”, page 973 sec. III. B. ¶7: “Thinker’s multi-bank on-chip buffers, distributed IO ports, and independent computation flows of different operations enable the PE array to be flexibly partitioned into four sub-arrays to process CONV, FC, pooling, and RNN-gating operations, respectively. Fig. 8(a) illustrates processing LRCN in AP-flow. CONV array is assigned with 15 × 13 general PEs; 15 × 3 general PEs are assigned for FC array.”, Table I: 

    PNG
    media_image1.png
    376
    246
    media_image1.png
    Greyscale
]). 
 “a computation unit having multiple computing cells, at least one computing cell of the multiple computing cells being configured to:” (Yin teaches PE arrays that are the main computing units. As seen in Fig. 2, the PE arrays are comprised of PEs. [Yin page 970 sec. III. A. ¶1: “Fig. 2 shows the top-level architecture of the proposed hybrid-NN processor. Two 16×16 heterogeneous PE arrays are the main computing units. Each array can be partitioned into sub-arrays for different functions, and PEs can be configured to execute bit-width adaptive operations.”, Fig. 2: 

    PNG
    media_image2.png
    430
    601
    media_image2.png
    Greyscale
]).
“i) receive, for the first neural network layer, an input of the sets of layer inputs accessed by the rotation unit,” (Yin teaches the computing cells (i.e. the PEs) receiving input feature points that were accessed in turn by the rotation unit. [Yin page 970 sec. III. A. ¶2: “The input feature points are loaded to PEs at the left/right edge, and are horizontally shifted to the PEs inside the array”, page 970-971 sec. III. A. ¶3: “During the computation of a specific layer, one sub-buffer provides input data to the arrays”, page 970 sec III. A: “2) On-Chip Memory System: Two 144-KB multi-bank SRAM data buffers (Data Buffer_1 and Data Buffer_2) store intermediate data between NN layers”]). 
“ii) receive a parameter for the first neural network layer,” (Yin teaches the PEs receiving weights which are parameters and the PEs are the computing cells for the neural network layers. [Page 971 sec. III. A. ¶4: “The 1-KB weight buffer is used to store the weights loaded from external memory and provides weights to PE arrays. In each PE array, a 16-KB local buffer exploits weight reuse, which provides 16 weights for 16 PE columns in parallel.”, page 973 sec. III. B. ¶7: “Thinker’s multi-bank on-chip buffers, distributed IO ports, and independent computation flows of different operations enable the PE array to be flexibly partitioned into four sub-arrays to process CONV, FC, pooling, and RNN-gating operations, respectively. Fig. 8(a) illustrates processing LRCN in AP-flow. CONV array is assigned with 15 × 13 general PEs; 15 × 3 general PEs are assigned for FC array.”, Table I: 
    PNG
    media_image1.png
    376
    246
    media_image1.png
    Greyscale
]).
“and iii) generate at least a portion of an output of the first neural network layer using the input and the parameter;” (As can be seen in Fig. 6(b), Yin teaches an output for a neural network layer generated based on weights (i.e. parameters) and input data. [page 973 sec. III. B. ¶7: “Thinker’s multi-bank on-chip buffers, distributed IO ports, and independent computation flows of different operations enable the PE array to be flexibly partitioned into four sub-arrays to process CONV, FC, pooling, and RNN-gating operations, respectively. Fig. 8(a) illustrates processing LRCN in AP-flow. CONV array is assigned with 15 × 13 general PEs; 15 × 3 general PEs are assigned for FC array.”, page 972 sec. III. B. ¶1: “In PE array, the computation of output points are fixed on the respective PEs, reusing weights, and input feature points in the vertical and horizontal directions, respectively (also known as output stationary dataflow).”, Fig. 6(b): 
    PNG
    media_image3.png
    380
    597
    media_image3.png
    Greyscale
]). 
“and a crossbar unit configured to cause the output of the first neural network layer to be stored, in the activation memory, in accordance with a bank assignment pattern that is based on the programming data and an attribute value assigned to a second neural network layer.” (Yin teaches the output of a layer being stored onto a data buffer (i.e. the activation memory) in accordance to a rule (i.e. a bank assignment pattern) based on the programming data and attribute values wherein the attribute values taught by Yin are stride values. [Yin page 970-971 sec. III. A. ¶3: “In order to support array partitioning (AP), each data buffer is divided into two sub-buffers for CONV and FC sub-arrays. During the computation of a specific layer, one sub-buffer provides input data to the arrays and the other stores the output data from the arrays.”, page 975 sec IV. B. ¶5: “Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”]). 
“a rotation unit configured to rotate accessing the sets of layer inputs from the activation memory based on the programming data;”
Young teaches a method for performing kernel striding for convolutional neural network layers on a hardware circuit (Young Abstract: “Methods for receiving a request to process, on a hardware circuit, a neural network comprising a first convolutional neural network layer having a stride greater than one […]”). Young teaches:
“a rotation unit configured to rotate accessing the sets of layer inputs from the activation memory based on the programming data;” (Examiner notes that “rotate accessing the sets of layer inputs” is not defined. In light of page 35 lines 5-15 of the instant specifications, rotate accessing the sets of layer inputs is interpreted as shifting input data that is then provided to the computing cell of the computation unit. Young teaches shifting the activation input (otherwise known as the inputs to a neural network layer. [Young Col 8 lines 30-37: “a host interface, e.g., the host interface 302 of FIG. 3, shifts activation inputs throughout the array 406 along one dimension, e.g., to the right, while shifting weight inputs throughout the array 406 along another dimension, e.g., to the bottom. For example, over one clock cycle, the activation input at cell 414 can shift to an activation register in cell 416, which is to the right of cell 414.”, col 3 lines 64-67: “Data inputs to a neural network layer, e.g., either the input to the neural network or the outputs of the layer below the layer in the sequence, to a neural network layer can be referred to as activation inputs to the layer.”]). 
Yin, Young, and the instant application are analogous art because they are all directed to special purpose hardware for neural network computations.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the neural network processor disclosed by Yin to include “a rotation unit configured to rotate accessing the sets of layer inputs from the activation memory based on the programming data” as taught by Young. One would be motivated to do so to avoid processing delays, making neural network computations more efficient, as suggested by Young (Young col 3 lines 7-13: “This allows for an inference of a neural network that includes a convolutional layer having a stride greater than one to be determined efficiently without modifying the hardware architecture of the special purpose hardware circuit. That is, processing delays resulting from performing part of the processing off-chip, in software, or both, are avoided.”).

Regarding Claim 2:
Yin in view of Young teaches “The circuit of claim 1,”  as seen above. 
Yin further teaches:
“wherein the rotation unit is further configured to rotate elements of an input tensor,” (Examiner notes that “rotate elements” is not well defined. It is interpreted as shifting elements as per page 17 lines 21-23 of the instant specification. It can be seen in figure 10(e) that Yin teaches shifting the elements of input tensor. I is the input tensor; the elements of the input tensor are rotated/shifted to provide inputs. [Yin page 969 sec. II: “The input data include vectors and weighting matrixes, while the output data are vectors.”, page 975 sec. IV. B: “For a kernel of size K, since the output stationary dataflow is adopted, each row of PE array requires a set of K × K input feature points to calculate one output feature point. When different rows of PEs execute concurrently, input points for adjacent rows are overlapped.”, page 975 sec. IV. B: “In CONV layers, each input feature point can be represented as p = I[r][c][chin], where 0 ≤ r ≤ R−1, 0 ≤ c ≤ C−1, and 0 ≤ chin ≤ Chin. In order to enable the parallel access of fused data pattern  P, each point in P needs to be mapped to a unique bank.”, Fig. 10e: 
    PNG
    media_image4.png
    396
    601
    media_image4.png
    Greyscale
]). 
“where each element of the input tensor corresponds to a respective input of a set of inputs stored in the activation memory.” (It can be seen in figure 10(e) that Yin teaches each element of the input tensor corresponds to inputs stored in CONV buffer, which is part of the data buffers (i.e. the activation memory).  [Yin page 974 sec. IV. B ¶1: “Since ConvNet and FCNet/RNN have quite different data access patterns, in order to ensure the parallel data access, the multiple memory banks are partitioned to two groups, CONV buffer and FC buffer, data for CONV array and FC array with corresponding access pattern, respectively.”, Fig. 10e: 

    PNG
    media_image5.png
    396
    601
    media_image5.png
    Greyscale
]). 
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Yin with the teachings of Young for at least the same reasons as discussed above in claim 1.

Regarding Claim 3:
Yin in view of Young teach “The circuit of claim 2” as seen above.  
	Yin further teaches:
“wherein the rotation unit is further configured to: rotate elements of the input tensor along a first dimension of the input tensor based on a first rotation factor;” (Examiner notes that rotate elements is interpreted as shifting elements as per page 17 lines 21-25 of the instant specifications. Yin teaches shifting the input data over the feature map along the y dimension, as seen in Fig 10(c). [Yin Page 975 sec. IV. B ¶: “An example is given in Fig. 10(b)–(d), where a 3×3 (K = 3) kernel-based convolution is  performed with stride S of 1 in a 16-row CONV array.”, Fig. 10(b)-(d): 
    PNG
    media_image6.png
    541
    598
    media_image6.png
    Greyscale
 ]).  
“rotate elements of the input tensor along a different second dimension of the input tensor based on a second rotation factor that is different than the first rotation factor;” (Yin teaches shifting the input data over the feature map along the x dimension, seen in Fig. 10(d). [Yin Fig. 10(b)-(d): 
    PNG
    media_image7.png
    541
    598
    media_image7.png
    Greyscale
]). 
“and provide an input that corresponds to a rotated element of the input tensor to a computing cell of the computation unit.” (Yin teaches providing input to computing cells (i.e. PEs) of the computation unit (i.e. the PE array). [Yin page 975 sec. IV. B. ¶6: “Each input feature point for K × K filter is sequentially input into the corresponding row of PE array.”]). 
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Yin with the teachings of Young for at least the same reasons as discussed above in claim 1.

Regarding Claim 4:
Yin in view of Young teach “The circuit of claim 1” as seen above. 
Yin further teaches:  
“wherein the crossbar unit is further configured to: determine a mapping of activations in the output in response to processing the bank assignment pattern,” (Yin teaches mapping the output of each CONV layer in accordance to the bank assignment pattern. [Yin page 975 sec. IV. B: “In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”]). 
“where the mapping identifies memory banks of the activation memory for storing the activations for the second neural network layer based on the attribute value assigned to the second neural network layer.” (Yin teaches that the mapping involving the memory banks of the activation memory and bases the rule off of the attribute value (i.e. stride value; S in equation 1). [Yin page 975 sec IV. B. ¶5: “Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”, page 975 sec IV. B. ¶4 equation 1: 
    PNG
    media_image8.png
    76
    470
    media_image8.png
    Greyscale
]). 
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Yin with the teachings of Young for at least the same reasons as discussed above in claim 1.

Regarding Claim 5:
Yin in view of Young teach “The circuit of claim 4” as seen above. 
Yin further teaches:  
“wherein the crossbar unit is further configured to: cause data for the output of the first neural network layer to be stored at particular address locations of the activation memory,” (Yin teaches the output data being stored at particular locations in the CONV sub buffers of the data buffer (i.e. the activation memory) using a specific rule. [Yin page 970-971 sec. III. A. ¶3: “In order to support array partitioning (AP), each data buffer is divided into two sub-buffers for CONV and FC sub-arrays. During the computation of a specific layer, one sub-buffer provides input data to the arrays and the other stores the output data from the arrays. For next layer, two buffers exchange their functions. There are 48 banks in each buffer”, Fig. 10(a): 
    PNG
    media_image9.png
    561
    598
    media_image9.png
    Greyscale
, page 975 sec IV. B. ¶5: “Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”]).
“the data for the output being assigned to an address location of the activation memory based on a configurable mapping that changes for different respective layers of the neural network.” (Yin teaches the output map of the CONV layer being stored on a bank on the data buffer with the location dependent on a rule. After each layer, the two sub buffers switch functions and the mapping changes. [Yin page 975 sec. IV. B ¶5: “In CONV layers, each input feature point can be represented as p = I[r][c][chin], where 0 ≤ r ≤ R−1, 0 ≤ c ≤ C−1, and 0 ≤ chin ≤ Chin. In order to enable the parallel access of fused data pattern P, each point in P needs to be mapped to a unique bank. Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”, Page 970-971 sec. III. A. ¶3: “In order to support array partitioning (AP), each data buffer is divided into two sub-buffers for CONV and FC sub-arrays. During the computation of a specific layer, one sub-buffer provides input data to the arrays and the other stores the output data from the arrays. For next layer, two buffers exchange their functions.”]). 
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Yin with the teachings of Young for at least the same reasons as discussed above in claim 1.

Regarding Claim 6:
Yin in view of Young teach “The circuit of claim 4” as seen above. 
Yin further teaches:  
“wherein: the rotation unit is further configured to access output data for the output of the first neural network layer as layer inputs to the second neural network layer for processing at the second neural network layer;” (Yin teaches accessing the output data and the output data being used as input data for the next layer. [Yin page 975 sec. IV. B ¶5: “In CONV layers, each input feature point can be represented as p = I[r][c][chin], where 0 ≤ r ≤ R−1, 0 ≤ c ≤ C−1, and 0 ≤ chin ≤ Chin. In order to enable the parallel access of fused data pattern P, each point in P needs to be mapped to a unique bank. Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”]). 
“and the determined mapping is configured such that a bank conflict does not occur at the memory banks of the activation memory when the rotation unit accesses layer inputs for the second neural network layer that correspond to the output of the first neural network layer.” (Yin teaches using a specific rule to determine the bank location for storing data. To further support no bank conflict during the accessing of input data, the two sub buffers switch functions of sending input data and storing output data. [Yin page 975 sec. IV. B ¶5: “In CONV layers, each input feature point can be represented as p = I[r][c][chin], where 0 ≤ r ≤ R−1, 0 ≤ c ≤ C−1, and 0 ≤ chin ≤ Chin. In order to enable the parallel access of fused data pattern P, each point in P needs to be mapped to a unique bank. Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”, page 970-971 sec. III. A. ¶3: “In order to support array partitioning (AP), each data buffer is divided into two sub-buffers for CONV and FC sub-arrays. During the computation of a specific layer, one sub-buffer provides input data to the arrays and the other stores the output data from the arrays. For next layer, two buffers exchange their functions.”]). 
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Yin with the teachings of Young for at least the same reasons as discussed above in claim 1.

Regarding Claim 7:
Yin in view of Young teach “The circuit of claim 1” as seen above. 
Yin further teaches:  
“wherein the attribute value assigned to the second neural network layer is: a stride value for the second neural network layer, or a skip value for the second neural network layer.” (Yin teaches a stride value and this attribute value is used for storage location purposes.  [Yin page 975 sec. IV. B ¶5: “In CONV layers, each input feature point can be represented as p = I[r][c][chin], where 0 ≤ r ≤ R−1, 0 ≤ c ≤ C−1, and 0 ≤ chin ≤ Chin. In order to enable the parallel access of fused data pattern P, each point in P needs to be mapped to a unique bank. Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”, page 975 sec. IV. B equation 1: 
    PNG
    media_image10.png
    73
    335
    media_image10.png
    Greyscale
, Table II: 
    PNG
    media_image11.png
    229
    602
    media_image11.png
    Greyscale
]). 
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Yin with the teachings of Young for at least the same reasons as discussed above in claim 1.

Regarding Claim 8:
Yin in view of Young teach “The circuit of claim 1” as seen above. 
Yin further teaches:  
“wherein the core is configured to: use the rotation unit to access layer inputs stored in a first set of memory banks of the activation memory without the occurrence of a bank conflict;” (Yin teaches the CPU (which has at least one core) accessing input data stored in the activation memory depending on a specific rule which would prevent bank conflict. [Yin page 976 sec. IV. D. ¶2: “Thinker processor usually works as a coprocessor of primary CPU.”, page 975 sec. IV. B ¶5: “In order to enable the parallel access of fused data pattern P, each point in P needs to be mapped to a unique bank. Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”]). 
“and use the crossbar unit to store layer outputs in a second set of memory banks of the activation memory without the occurrence of a bank conflict.” (Yin teaches the storing of outputs into memory banks of the data buffers (i.e. the activation memory) based on specific rules which would prevent bank conflict. [Yin page 975 sec. IV. B ¶5: “In order to enable the parallel access of fused data pattern P, each point in P needs to be mapped to a unique bank. Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”]). 
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Yin with the teachings of Young for at least the same reasons as discussed above in claim 1.

Regarding Claim 9:
The circuit of claim 7” as seen above. 
Yin further teaches:  
“wherein the core is configured to: synchronize rotation based data access operations of the rotation unit with pattern based data storage operations of the crossbar unit to achieve a utilization rate of the computation unit that exceeds a threshold utilization rate.” (Yin teaches the accessing of data in the activation memory being assigned to make the maximum use of resources. Data access is done in parallel which is equivalent to synchronized data access. Examiner notes that the threshold is not defined in the claims. Threshold is interpreted to be any utilization rate from a different system. The method proposed by Yin is called AP-flow and the array utilization rates that result from using this method are shown in comparison to TM-flow, a method used in existing neural network processors. It can be seen in Table IV that at least one AP-flow utilization rate exceeds the TM-flow utilization rate which are interpreted to be the thresholds. [Yin Page 974 sec. IV. B. ¶1: “Input feature points for both CONV array and FC array need to be loaded from multi-bank on-chip buffer in parallel. Since ConvNet and FCNet/RNN have quite different data access patterns, in order to ensure the parallel data access, the multiple memory banks are partitioned to two groups, CONV buffer and FC buffer, which provide input data for CONV array and FC array with corresponding access pattern, respectively.”, page 973 sec. IV. B. ¶7: “Therefore, an AP-flow is proposed in this paper, in which majority of PEs are allocated to computation-intensive ConvNet and most of the bandwidth are allocated to memory-intensive FCNet/RNN. Thinker’s multi-bank on-chip buffers, distributed IO ports, and independent computation flows of different operations enable the PE array to be flexibly partitioned into four sub-arrays to process CONV, FC, pooling, and RNN-gating operations, respectively.” page 969 sec. I. ¶4: “the time-multiplexing (TM)-based computing flow, which is adopted in the existing NN processors, is inefficient for processing hybrid-NNs.”, page 977 sec. V. A. ¶3: “As shown in Table IV, we can see that AP-flow achieves better array utilization than TM-flow for all benchmarks except Yolo.”,  page 977 sec. V. Table IV: 

    PNG
    media_image12.png
    364
    1171
    media_image12.png
    Greyscale
]).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Yin with the teachings of Young for at least the same reasons as discussed above in claim 1.

Regarding Claim 10:
Yin in view of Young teach “The circuit of claim 1” as seen above. 
Young further teaches:  
“wherein the processing device is configured to: receive, from an external controller, an instruction comprising data values to be used at the core;” (Examiner notes that “external” for an external controller is not defined. External controller is interpreted to be a controller that is external to anything. Young teaches receiving control signals from an external processor; this external processor is considered the external controller. The control signals are received by the host interface which are a part of a neural network processing system. This system is implemented via a computer which is a processing device and comprises of at least one core. The control signals are instructions that comprise of data values such as stride value and are used by the neural network processing system. [Young col 7 lines 18-20: “In some other implementations, the host interface 302 passes in a control signal from an external processor.”, col 3 lines 28-29: “FIG. 3 shows an example neural network processing system.”, col 6 lines 58-60: “FIG. 3 shows an example special-purpose hardware circuit 300 for performing neural network computations. The system 300 includes a host interface 302.”, col 4 lines 50-53: “The neural network processing system 100 is an example of a system implemented as one or more computers in one or more locations in which the systems, components, and techniques described below can be implemented.”, col 10 lines 36-45: “That is, the control signals 610 can regulate whether the activation values are pooled, where the activation values are stored, e.g., in the unified buffer 308, or can otherwise regulate handling of the activation values. The control signals 610 can also specify the activation or pooling functions, as well as other parameters for processing the activation values or pooling values, e.g., a stride value.”, col 13 lines 25-26: “The matrix computation unit 312 performs computations based on the control signals”]). 
“and provide at least the data values of the instruction to the core for storing at a component of the core.” (Young teaches the host interface (which is a part of the processing device) providing instructions that include stride value to another component of the neural processing system (i.e. the sequencer) which is implemented via a computer comprising at least one core. [Young col 12 lines 51- 57: “For example, upon receiving instructions for implementing the neural network layer having a stride greater than one, the host interface 302 can send the instructions to the sequencer 306 of FIG. 3, and the sequencer 306 can convert the instructions into low level control signals that control the special-purpose hardware circuit 300 of FIG. 3 to perform the neural network computation.”]). 
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Yin with the teachings of Young for at least the same reasons as discussed above in claim 1.

Regarding Claim 11:
Yin in view of Young teach “The circuit of claim 10” as seen above. 
Young further teaches:  
“wherein the processing device is a digital signal processor 20(DSP) configured to:” (The processing device of the neural network processing system is a computer which is a digital signal processor. [Young col 4 lines 50-51: “The neural network processing system 100 is an example of a system implemented as one or more computers”]).  
“process an instruction received from the external controller;” (Young teaches the neural network processing system (implemented through a DSP, as stated above) receiving control signals (i.e. instructions) from an external processor which is equivalent to an external controller. [Young col 7 lines 18-20: “In some other implementations, the host interface 302 passes in a control signal from an external processor.”, col 3 lines 28-29: “FIG. 3 shows an example neural network processing system.”, col 6 lines 58-60: “FIG. 3 shows an example special-purpose hardware circuit 300 for performing neural network computations. The system 300 includes a host interface 302.”]). 
“and in response to processing the instruction, configure one or more registers at the core using data values of the instruction.” (Young teaches registers storing control signals that control the neural network computations and are equivalent to the configuration data. The neural network computations are done in the matrix computation unit which is a part of the neural network processing system (implemented through a DSP, as stated above). The DSP comprises of at least one core. [Young col 7-8 lines 65-1: “FIG. 4 shows an example architecture 400 including a matrix computation unit. The matrix computation unit is a two-dimensional systolic array 406. The array 406 includes multiple cells 404.”, col 9-10 lines 60-1: “The control register can store a control signal that determines whether the cell should shift either the weight input or the activation input to adjacent cells. In some implementations, shifting the weight input or the activation input takes one or more clock cycles. The control signal can also determine whether the activation input or weight inputs are transferred to the multiplication circuitry 508, or can determine whether the multiplication circuitry 508 operates on the activation and weight inputs.”, col 4 lines 50-51: “The neural network processing system 100 is an example of a system implemented as one or more computers”]).  
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Yin with the teachings of Young for at least the same reasons as discussed above in claim 1.

Regarding Claim 12:
Yin in view of Young teach “The circuit of claim 11” as seen above. 
Young further teaches:  
“wherein the core is configured to access the one or more registers to obtain configuration data that defines the computations for the neural network,” (Young teaches registers storing control signals that control the neural network computations and are equivalent to the configuration data. The neural network computations are done in the matrix computation unit which is a part of the neural network processing system. This neural network processing system is implemented via a computer which comprises of at least one core. [Young col 7-8 lines 65-1: “FIG. 4 shows an example architecture 400 including a matrix computation unit. The matrix computation unit is a two-dimensional systolic array 406. The array 406 includes multiple cells 404.”, col 9-10 lines 60-1: “The control register can store a control signal that determines whether the cell should shift either the weight input or the activation input to adjacent cells. In some implementations, shifting the weight input or the activation input takes one or more clock cycles. The control signal can also determine whether the activation input or weight inputs are transferred to the multiplication circuitry 508, or can determine whether the multiplication circuitry 508 operates on the activation and weight inputs.”, col 4 lines 50-51: “The neural network processing system 100 is an example of a system implemented as one or more computers”]).  
“the computations being performed by the computation unit of the core based on data values derived from the instructions received from the external controller.” (Young teaches control signals being received from an external controller that the matrix computation unit of the neural network processing system uses to perform the computations. [Young col 7 lines 18-20: “In some other implementations, the host interface 302 passes in a control signal from an external processor.”, col 13 lines 25-26: “The matrix computation unit 312 performs computations based on the control signals”, col 10 lines 36-45: “That is, the control signals 610 can regulate whether the activation values are pooled, where the activation values are stored, e.g., in the unified buffer 308, or can otherwise regulate handling of the activation values. The control signals 610 can also specify the activation or pooling functions, as well as other parameters for processing the activation values or pooling values, e.g., a stride value.”]).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Yin with the teachings of Young for at least the same reasons as discussed above in claim 1.

Regarding Claim 13:
Yin teaches a hybrid neural network processor (called “Thinker”) that has configurable processing elements, processing element arrays, and memory banking. Yin teaches:
“A computer-implemented method for performing computations for a neural network comprising a plurality of neural network layers, the method comprising:” (Yin teaches a computer implemented hybrid neural network processor which includes neural network layers. [Yin page 968 sec. I. ¶2: “This paper presents a hybrid-NN processor (called “Thinker”) [20], which is designed to address the three special features of hybrid-NNs.”, page 969 sec. III. A. ¶2: “Fig. 2 shows the top-level architecture of the proposed hybrid-NN processor.”, Fig. 2, Abstract: “First, each processing element (PE) supports bit-width adaptive computing to meet various bit-widths of neural layers”, page 976 sec. IV. D. ¶2: “Thinker processor usually works as a coprocessor of primary CPU.”]).
“providing, by a processing device of a hardware circuit, programming data for performing the computations for the neural network;” (Yin teaches a CPU that provides data signals (referred by Yin as ‘input data’) and programming data (referred by Yin as ‘configuration words’). A CPU is a processing device of a hardware circuit. [Yin page 976 sec IV. D. ¶2: “Thinker processor usually works as a coprocessor of primary CPU. Before the execution of a hybrid-NN, CPU loads the array and layer parameters into the Thinker’s parameter buffer, and PE configuration words into configuration buffer. During runtime, CPU sends the input data and weights to Thinker’s data buffers.”]).
“receiving, by a core of the hardware circuit that communicates with the processing device, the programming data provided by the processing device,” (Yin teaches the Thinker processor receiving configuration words (i.e. programming data) from the CPU. The Thinker would comprise of at least a core due to it being a processor. The Thinker is in communication with the CPU due to it being a coprocessor of the CPU. [Yin page 976 sec IV. D. ¶2: “Thinker processor usually works as a coprocessor of primary CPU. Before the execution of a hybrid-NN, CPU loads the array and layer parameters into the Thinker’s parameter buffer, and PE configuration words into configuration buffer. During runtime, CPU sends the input data and weights to Thinker’s data buffers.”]).
“wherein the core comprises an activation memory configured to store sets of layer inputs” (Yin teaches an activation memory (referred to as the data buffers by Yin) storing input data to the arrays which are used for computations for the  neural network layers. [Yin Page 970-971 sec. III A. ¶3: “2) On-Chip Memory System: Two 144-KB multi-bank SRAM data buffers (Data Buffer_1 and Data Buffer_2) store intermediate data between NN layers […] During the computation of a specific layer, one sub-buffer provides input data to the arrays and the other stores the output data from the arrays.”, page 973 sec. III. B. ¶7: “Thinker’s multi-bank on-chip buffers, distributed IO ports, and independent computation flows of different operations enable the PE array to be flexibly partitioned into four sub-arrays to process CONV, FC, pooling, and RNN-gating operations, respectively. Fig. 8(a) illustrates processing LRCN in AP-flow. CONV array is assigned with 15 × 13 general PEs; 15 × 3 general PEs are assigned for FC array.”, Table I: 
    PNG
    media_image1.png
    376
    246
    media_image1.png
    Greyscale
]).  
“and a parameter memory configured to store parameters for a first neural network layer;” (Yin teaches a weight buffer for storing weights for the neural network layers. Weight buffer is the parameter memory. [Yin page 971 sec. III. A: “The 1-KB weight buffer is used to store the weights loaded from external memory and provides weights to PE arrays.”, page 973 sec. III. B. ¶7: “Thinker’s multi-bank on-chip buffers, distributed IO ports, and independent computation flows of different operations enable the PE array to be flexibly partitioned into four sub-arrays to process CONV, FC, pooling, and RNN-gating operations, respectively. Fig. 8(a) illustrates processing LRCN in AP-flow. CONV array is assigned with 15 × 13 general PEs; 15 × 3 general PEs are assigned for FC array.”, Table I: 
    PNG
    media_image1.png
    376
    246
    media_image1.png
    Greyscale
]). 
“receiving, by a computation unit of the core, an input of the sets of layer inputs accessed by the rotation unit, the input being received for processing at the first neural network layer;” (Yin teaches PE arrays that are the main computing units. As seen in Fig. 2, the PE arrays are comprised of PEs. Yin teaches the computing cells (i.e. the PEs) receiving input feature points that were accessed in turn by the rotation unit. [Yin page 970 sec. III. A. ¶1: “Fig. 2 shows the top-level architecture of the proposed hybrid-NN processor. Two 16×16 heterogeneous PE arrays are the main computing units. Each array can be partitioned into sub-arrays for different functions, and PEs can be configured to execute bit-width adaptive operations.”, Fig. 2: 
    PNG
    media_image2.png
    430
    601
    media_image2.png
    Greyscale
, page 970 sec. III. A. ¶2: “The input feature points are loaded to PEs at the left/right edge, and are horizontally shifted to the PEs inside the array”, page 970-971 sec. III. A. ¶3: “During the computation of a specific layer, one sub-buffer provides input data to the arrays”, page 970 sec III. A: “2) On-Chip Memory System: Two 144-KB multi-bank SRAM data buffers (Data Buffer_1 and Data Buffer_2) store intermediate data between NN layers”]). 
“receiving, by the computation unit, a parameter for the first neural network layer;” (Yin teaches the PEs receiving weights which are parameters and the PEs are the computing cells for the neural network layers. [Page 971 sec. III. A. ¶4: “The 1-KB weight buffer is used to store the weights loaded from external memory and provides weights to PE arrays. In each PE array, a 16-KB local buffer exploits weight reuse, which provides 16 weights for 16 PE columns in parallel.”, page 973 sec. III. B. ¶7: “Thinker’s multi-bank on-chip buffers, distributed IO ports, and independent computation flows of different operations enable the PE array to be flexibly partitioned into four sub-arrays to process CONV, FC, pooling, and RNN-gating operations, respectively. Fig. 8(a) illustrates processing LRCN in AP-flow. CONV array is assigned with 15 × 13 general PEs; 15 × 3 general PEs are assigned for FC array.”, Table I: 
    PNG
    media_image1.png
    376
    246
    media_image1.png
    Greyscale
]).
“generating, by the computation unit, an output of the first neural network layer using the input accessed by the rotation unit and the parameter;” (As can be seen in Fig. 6(b), Yin teaches an output for a neural network layer generated based on weights (i.e. parameters) and input data. [page 973 sec. III. B. ¶7: “Thinker’s multi-bank on-chip buffers, distributed IO ports, and independent computation flows of different operations enable the PE array to be flexibly partitioned into four sub-arrays to process CONV, FC, pooling, and RNN-gating operations, respectively. Fig. 8(a) illustrates processing LRCN in AP-flow. CONV array is assigned with 15 × 13 general PEs; 15 × 3 general PEs are assigned for FC array.”, page 972 sec. III. B. ¶1: “In PE array, the computation of output points are fixed on the respective PEs, reusing weights, and input feature points in the vertical and horizontal directions, respectively (also known as output stationary dataflow).”, Fig. 6(b): 
    PNG
    media_image3.png
    380
    597
    media_image3.png
    Greyscale
]). 
“and storing, using a crossbar unit of the core, the output of the first neural network layer in the activation memory in accordance with a bank assignment pattern that is based on the programming data and an attribute value assigned to a second neural network layer.” (Yin teaches the output of a layer being stored onto a data buffer (i.e. the activation memory) in accordance to a rule (i.e. a bank assignment pattern) based on the programming data and attribute values wherein the attribute values taught by Yin are stride values. [Yin page 970-971 sec. III. A. ¶3: “In order to support array partitioning (AP), each data buffer is divided into two sub-buffers for CONV and FC sub-arrays. During the computation of a specific layer, one sub-buffer provides input data to the arrays and the other stores the output data from the arrays.”, page 975 sec IV. B. ¶5: “Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”]). 
“accessing, by a rotation unit of the core, the sets of layer inputs stored at the activation memory, wherein the rotation unit rotates accessing the sets of layer inputs based on the programming data received by the core;” 
Young teaches a method for performing kernel striding for convolutional neural network layers on a hardware circuit (Young Abstract: “Methods for receiving a request to process, on a hardware circuit, a neural network comprising a first convolutional neural network layer having a stride greater than one […]”). Young teaches:
“accessing, by a rotation unit of the core, the sets of layer inputs stored at the activation memory, wherein the rotation unit rotates accessing the sets of layer inputs based on the programming data received by the core;” (Examiner notes that “rotate accessing the sets of layer inputs” is not defined. In light of page 35 lines 5-15 of the instant specifications, rotate accessing the sets of layer inputs is interpreted as shifting input data that is then provided to the computing cell of the computation unit. Young teaches shifting the activation input (otherwise known as the inputs to a neural network layer. [Young Col 8 lines 30-37: “a host interface, e.g., the host interface 302 of FIG. 3, shifts activation inputs throughout the array 406 along one dimension, e.g., to the right, while shifting weight inputs throughout the array 406 along another dimension, e.g., to the bottom. For example, over one clock cycle, the activation input at cell 414 can shift to an activation register in cell 416, which is to the right of cell 414.”, col 3 lines 64-67: “Data inputs to a neural network layer, e.g., either the input to the neural network or the outputs of the layer below the layer in the sequence, to a neural network layer can be referred to as activation inputs to the layer.”]). 
Yin, Young, and the instant application are analogous art because they are all directed to special purpose hardware for neural network computations.
“accessing, by a rotation unit of the core, the sets of layer inputs stored at the activation memory, wherein the rotation unit rotates accessing the sets of layer inputs based on the programming data received by the core” as taught by Young. One would be motivated to do so to avoid processing delays, making neural network computations more efficient, as suggested by Young (Young col 3 lines 7-13: “This allows for an inference of a neural network that includes a convolutional layer having a stride greater than one to be determined efficiently without modifying the hardware architecture of the special purpose hardware circuit. That is, processing delays resulting from performing part of the processing off-chip, in software, or both, are avoided.”).

Regarding Claim 14:
Yin in view of Young teaches “The method of claim 13,” as seen above. 
Yin further teaches:
“further comprising: rotating, by the rotation unit, elements of an input tensor,” (Examiner notes that “rotate elements” is not well defined. It is interpreted as shifting elements as per page 17 lines 21-23 of the instant specification. It can be seen in figure 10(e) that Yin teaches shifting the elements of input tensor. I is the input tensor; the elements of the input tensor are rotated/shifted to provide inputs. [Yin page 969 sec. II: “The input data include vectors and weighting matrixes, while the output data are vectors.”, page 975 sec. IV. B: “For a kernel of size K, since the output stationary dataflow is adopted, each row of PE array requires a set of K × K input feature points to calculate one output feature point. When different rows of PEs execute concurrently, input points for adjacent rows are overlapped.”, page 975 sec. IV. B: “In CONV layers, each input feature point can be represented as p = I[r][c][chin], where 0 ≤ r ≤ R−1, 0 ≤ c ≤ C−1, and 0 ≤ chin ≤ Chin. In order to enable the parallel access of fused data pattern  P, each point in P needs to be mapped to a unique bank.”, Fig. 10e: 
    PNG
    media_image4.png
    396
    601
    media_image4.png
    Greyscale
]). 
“where each element of the input tensor corresponds to a respective input of a set of inputs stored in the activation memory.” (It can be seen in figure 10(e) that Yin teaches each element of the input tensor corresponds to inputs stored in CONV buffer, which is part of the data buffers (i.e. the activation memory).  [Yin page 974 sec. IV. B ¶1: “Since ConvNet and FCNet/RNN have quite different data access patterns, in order to ensure the parallel data access, the multiple memory banks are partitioned to two groups, CONV buffer and FC buffer, data for CONV array and FC array with corresponding access pattern, respectively.”, Fig. 10e: 

    PNG
    media_image5.png
    396
    601
    media_image5.png
    Greyscale
]). 
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Yin with the teachings of Young for at least the same reasons as discussed above in claim 13.

Regarding Claim 15:
Yin in view of Young teaches “The method of claim 14,” as seen above. 
Yin further teaches:
“further comprising: rotating, by the rotation unit, elements of the input tensor along a first dimension of the input tensor based on a first rotation factor;” (Examiner notes that rotate elements is interpreted as shifting elements as per page 17 lines 21-25 of the instant specifications. Yin teaches shifting the input data over the feature map along the y dimension, as seen in Fig 10(c). [Yin Page 975 sec. IV. B ¶: “An example is given in Fig. 10(b)–(d), where a 3×3 (K = 3) kernel-based convolution is  performed with stride S of 1 in a 16-row CONV array.”, Fig. 10(b)-(d): 
    PNG
    media_image6.png
    541
    598
    media_image6.png
    Greyscale
 ]).  
“rotating, by the rotation unit, elements of the input tensor along a different second dimension of the input tensor based on a second rotation factor that is different than the first rotation factor;” (Yin teaches shifting the input data over the feature map along the x dimension, seen in Fig. 10(d). [Yin Fig. 10(b)-(d): 

    PNG
    media_image7.png
    541
    598
    media_image7.png
    Greyscale
]). 
“and providing, by the rotation unit, an input that corresponds to a rotated element of the input tensor to a computing cell of the computation unit.”(Yin teaches providing input to computing cells (i.e. PEs) of the computation unit (i.e. the PE array). [Yin page 975 sec. IV. B. ¶6: “Each input feature point for K × K filter is sequentially input into the corresponding row of PE array.”]). 
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Yin with the teachings of Young for at least the same reasons as discussed above in claim 13.

Regarding Claim 16:
Yin in view of Young teaches “The method of claim 13,” as seen above. 
Yin further teaches:
“further comprising:  5determining, by the crossbar unit, a mapping of activations in the output in response to processing the bank assignment pattern,” (Yin teaches mapping the output of each CONV layer in accordance to the bank assignment pattern. [Yin page 975 sec. IV. B: “In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”]). 
“where the mapping identifies memory banks of the activation memory for storing the activations for the second neural network layer based on the attribute value assigned to the second neural network layer.”  (Yin teaches that the mapping involving the memory banks of the activation memory and bases the rule off of the attribute value (i.e. stride value; S in equation 1). [Yin page 975 sec IV. B. ¶5: “Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”, page 975 sec IV. B. ¶4 equation 1: 
    PNG
    media_image8.png
    76
    470
    media_image8.png
    Greyscale
]). 
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Yin with the teachings of Young for at least the same reasons as discussed above in claim 13.

Regarding Claim 17:
Yin in view of Young teaches “The method of claim 16,” as seen above. 
Yin further teaches:
“further comprising: assigning, using the crossbar unit, data for the output of the first neural network layer to an address location of the activation memory based on a configurable mapping that changes for different respective layers of the neural network;” (Yin teaches the output map of the CONV layer being assigned to be stored at a bank location on the data buffer with the location dependent on a rule. After each layer, the two sub buffers switch functions and the mapping changes. [Yin page 975 sec. IV. B ¶5: “In CONV layers, each input feature point can be represented as p = I[r][c][chin], where 0 ≤ r ≤ R−1, 0 ≤ c ≤ C−1, and 0 ≤ chin ≤ Chin. In order to enable the parallel access of fused data pattern P, each point in P needs to be mapped to a unique bank. Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”, Page 970-971 sec. III. A. ¶3: “In order to support array partitioning (AP), each data buffer is divided into two sub-buffers for CONV and FC sub-arrays. During the computation of a specific layer, one sub-buffer provides input data to the arrays and the other stores the output data from the arrays. For next layer, two buffers exchange their functions.”]). 
“and storing, using the crossbar unit, the data for the output of the first neural network layer at particular assigned address locations of the activation memory based on the configurable mapping for the second neural network layer.” (Yin teaches the output data being stored at particular locations in the CONV sub buffers of the data buffer (i.e. the activation memory) using a specific rule. [Yin page 970-971 sec. III. A. ¶3: “In order to support array partitioning (AP), each data buffer is divided into two sub-buffers for CONV and FC sub-arrays. During the computation of a specific layer, one sub-buffer provides input data to the arrays and the other stores the output data from the arrays. For next layer, two buffers exchange their functions. There are 48 banks in each buffer”, Fig. 10(a): 
    PNG
    media_image9.png
    561
    598
    media_image9.png
    Greyscale
, page 975 sec IV. B. ¶5: “Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”]).

Regarding Claim 18:
Yin in view of Young teaches “The method of claim 16,” as seen above. 
Yin further teaches:
“wherein: the rotation unit is further configured to access output data for the output of the first neural network layer as layer inputs to the second neural network layer for processing at the second neural network layer;” (Yin teaches accessing the output data and the output data being used as input data for the next layer. [Yin page 975 sec. IV. B ¶5: “In CONV layers, each input feature point can be represented as p = I[r][c][chin], where 0 ≤ r ≤ R−1, 0 ≤ c ≤ C−1, and 0 ≤ chin ≤ Chin. In order to enable the parallel access of fused data pattern P, each point in P needs to be mapped to a unique bank. Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”]). 
“and the determined mapping is configured such that a bank conflict does not occur at the memory banks of the activation memory when the rotation unit accesses layer inputs for the second neural network layer that correspond to the output of the first neural network layer.” (Yin teaches using a specific rule to determine the bank location for storing data. To further support no bank conflict during the accessing of input data, the two sub buffers switch functions of sending input data and storing output data. [Yin page 975 sec. IV. B ¶5: “In CONV layers, each input feature point can be represented as p = I[r][c][chin], where 0 ≤ r ≤ R−1, 0 ≤ c ≤ C−1, and 0 ≤ chin ≤ Chin. In order to enable the parallel access of fused data pattern P, each point in P needs to be mapped to a unique bank. Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”, page 970-971 sec. III. A. ¶3: “In order to support array partitioning (AP), each data buffer is divided into two sub-buffers for CONV and FC sub-arrays. During the computation of a specific layer, one sub-buffer provides input data to the arrays and the other stores the output data from the arrays. For next layer, two buffers exchange their functions.”]). 


Regarding Claim 19:
Yin in view of Young teaches “The method of claim 13,” as seen above. 
Yin further teaches:
“further comprising: assigning a stride value for the second neural network layer that corresponds to the attribute value; or assigning a skip value for the second neural network layer that corresponds to the attribute value.”(Yin teaches a stride value and this attribute value is used for storage location purposes.  [Yin page 975 sec. IV. B ¶5: “In CONV layers, each input feature point can be represented as p = I[r][c][chin], where 0 ≤ r ≤ R−1, 0 ≤ c ≤ C−1, and 0 ≤ chin ≤ Chin. In order to enable the parallel access of fused data pattern P, each point in P needs to be mapped to a unique bank. Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”, page 975 sec. IV. B equation 1: 
    PNG
    media_image10.png
    73
    335
    media_image10.png
    Greyscale
, Table II: 
    PNG
    media_image11.png
    229
    602
    media_image11.png
    Greyscale
]). 
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Yin with the teachings of Young for at least the same reasons as discussed above in claim 13.

Regarding Claim 20:
Yin in view of Young teaches “The method of claim 13,” as seen above. 
Yin further teaches:
“further comprising: using, by the core, the rotation unit to access layer inputs stored in a first set of memory banks of the activation memory without the occurrence of a bank conflict;” (Yin teaches the Thinker processor (which has at least one core) accessing input data stored in the activation memory depending on a specific rule which would prevent bank conflict. [Yin page 976 sec. IV. D. ¶2: “Thinker processor usually works as a coprocessor of primary CPU.”, page 975 sec. IV. B ¶5: “In order to enable the parallel access of fused data pattern P, each point in P needs to be mapped to a unique bank. Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”]). 
“and using, by the core, the crossbar unit to store layer outputs in a second set of memory banks of the activation memory without the occurrence of a bank conflict.” (Yin teaches the Thinker processor (comprising of at least one core) storing outputs into memory banks of the data buffers (i.e. the activation memory) based on specific rules which would prevent bank conflict. [Yin page 975 sec. IV. B ¶5: “In order to enable the parallel access of fused data pattern P, each point in P needs to be mapped to a unique bank. Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”]). 
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Yin with the teachings of Young for at least the same reasons as discussed above in claim 13.

Regarding Claim 21:
Yin in view of Young teaches “The method of claim 20,” as seen above. 
Yin further teaches:
“further comprising: synchronizing, by the core, rotation based data access operations of the rotation unit with pattern based data storage operations of the crossbar unit to achieve a utilization rate of the computation unit that exceeds a threshold utilization rate.” (Yin teaches the accessing of data in the activation memory being assigned to make the maximum use of resources. Data access is done in parallel which is equivalent to synchronized data access. Examiner notes that the threshold is not defined in the claims. Threshold is interpreted to be any utilization rate from a different system. The method proposed by Yin is called AP-flow and the array utilization rates that result from using this method are shown in comparison to TM-flow, a method used in existing neural network processors. It can be seen in Table IV that at least one AP-flow utilization rate exceeds the TM-flow utilization rate which are interpreted to be the thresholds. [Yin Page 974 sec. IV. B. ¶1: “Input feature points for both CONV array and FC array need to be loaded from multi-bank on-chip buffer in parallel. Since ConvNet and FCNet/RNN have quite different data access patterns, in order to ensure the parallel data access, the multiple memory banks are partitioned to two groups, CONV buffer and FC buffer, which provide input data for CONV array and FC array with corresponding access pattern, respectively.”, page 973 sec. IV. B. ¶7: “Therefore, an AP-flow is proposed in this paper, in which majority of PEs are allocated to computation-intensive ConvNet and most of the bandwidth are allocated to memory-intensive FCNet/RNN. Thinker’s multi-bank on-chip buffers, distributed IO ports, and independent computation flows of different operations enable the PE array to be flexibly partitioned into four sub-arrays to process CONV, FC, pooling, and RNN-gating operations, respectively.” page 969 sec. I. ¶4: “the time-multiplexing (TM)-based computing flow, which is adopted in the existing NN processors, is inefficient for processing hybrid-NNs.”, page 977 sec. V. A. ¶3: “As shown in Table IV, we can see that AP-flow achieves better array utilization than TM-flow for all benchmarks except Yolo.”,  page 977 sec. V. Table IV: 

    PNG
    media_image12.png
    364
    1171
    media_image12.png
    Greyscale
]).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Yin with the teachings of Young for at least the same reasons as discussed above in claim 13.

Regarding Claim 22:
Yin in view of Young teaches “The method of claim 13,” as seen above. 
Young further teaches:
“further comprising: receiving, by the processing device and from an external controller, an instruction comprising data values to be used at the core;” (Examiner notes that “external” for an external controller is not defined. External controller is interpreted to be a controller that is external to anything. Young teaches receiving control signals from an external processor; this external processor is considered the external controller. The control signals are received by the host interface which are a part of a neural network processing system. This system is implemented via a computer which is a processing device and comprises of at least one core. The control signals are instructions that comprise of data values such as stride value and are used by the neural network processing system. [Young col 7 lines 18-20: “In some other implementations, the host interface 302 passes in a control signal from an external processor.”, col 3 lines 28-29: “FIG. 3 shows an example neural network processing system.”, col 6 lines 58-60: “FIG. 3 shows an example special-purpose hardware circuit 300 for performing neural network computations. The system 300 includes a host interface 302.”, col 4 lines 50-53: “The neural network processing system 100 is an example of a system implemented as one or more computers in one or more locations in which the systems, components, and techniques described below can be implemented.”, col 10 lines 36-45: “That is, the control signals 610 can regulate whether the activation values are pooled, where the activation values are stored, e.g., in the unified buffer 308, or can otherwise regulate handling of the activation values. The control signals 610 can also specify the activation or pooling functions, as well as other parameters for processing the activation values or pooling values, e.g., a stride value.”, col 13 lines 25-26: “The matrix computation unit 312 performs computations based on the control signals”]). 
“and providing, by the processing device, at least the data values of the instruction to the core for storing at a component of the core.” (Young teaches the host interface (which is a part of the processing device) providing instructions that include stride value to another component of the neural processing system (i.e. the sequencer) which is implemented via a computer comprising at least one core. [Young col 12 lines 51- 57: “For example, upon receiving instructions for implementing the neural network layer having a stride greater than one, the host interface 302 can send the instructions to the sequencer 306 of FIG. 3, and the sequencer 306 can convert the instructions into low level control signals that control the special-purpose hardware circuit 300 of FIG. 3 to perform the neural network computation.”]). 


Regarding Claim 23:
Yin in view of Young teaches “The method of claim 22,” as seen above. 
Young further teaches:
“wherein the processing device is a digital signal processor (DSP) and the method further comprises:” (The processing device of the neural network processing system is a computer which is a digital signal processor. [Young col 4 lines 50-51: “The neural network processing system 100 is an example of a system implemented as one or more computers”]).  
“processing, by the DSP, an instruction received from the external controller;” (Young teaches the neural network processing system (implemented through a DSP, as stated above) receiving control signals (i.e. instructions) from an external processor which is equivalent to an external controller. [Young col 7 lines 18-20: “In some other implementations, the host interface 302 passes in a control signal from an external processor.”, col 3 lines 28-29: “FIG. 3 shows an example neural network processing system.”, col 6 lines 58-60: “FIG. 3 shows an example special-purpose hardware circuit 300 for performing neural network computations. The system 300 includes a host interface 302.”]). 
“and in response to processing the instruction, configuring, by the DSP, one or more registers at the core using data values of the instruction.”(Young teaches registers storing control signals that control the neural network computations and are equivalent to the configuration data. The neural network computations are done in the matrix computation unit which is a part of the neural network processing system (implemented through a DSP, as stated above). The DSP comprises of at least one core. [Young col 7-8 lines 65-1: “FIG. 4 shows an example architecture 400 including a matrix computation unit. The matrix computation unit is a two-dimensional systolic array 406. The array 406 includes multiple cells 404.”, col 9-10 lines 60-1: “The control register can store a control signal that determines whether the cell should shift either the weight input or the activation input to adjacent cells. In some implementations, shifting the weight input or the activation input takes one or more clock cycles. The control signal can also determine whether the activation input or weight inputs are transferred to the multiplication circuitry 508, or can determine whether the multiplication circuitry 508 operates on the activation and weight inputs.”, col 4 lines 50-51: “The neural network processing system 100 is an example of a system implemented as one or more computers”]).  
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Yin with the teachings of Young for at least the same reasons as discussed above in claim 13.

Regarding Claim 24:
Yin in view of Young teaches “The method of claim 23,” as seen above. 
Young further teaches:
“further comprising: accessing, by the core, the configured one or more registers to obtain configuration data that defines the computations for the neural network;” (Young teaches registers storing control signals that control the neural network computations and are equivalent to the configuration data. The neural network computations are done in the matrix computation unit which is a part of the neural network processing system. This neural network processing system is implemented via a computer which comprises of at least one core. [Young col 7-8 lines 65-1: “FIG. 4 shows an example architecture 400 including a matrix computation unit. The matrix computation unit is a two-dimensional systolic array 406. The array 406 includes multiple cells 404.”, col 9-10 lines 60-1: “The control register can store a control signal that determines whether the cell should shift either the weight input or the activation input to adjacent cells. In some implementations, shifting the weight input or the activation input takes one or more clock cycles. The control signal can also determine whether the activation input or weight inputs are transferred to the multiplication circuitry 508, or can determine whether the multiplication circuitry 508 operates on the activation and weight inputs.”, col 4 lines 50-51: “The neural network processing system 100 is an example of a system implemented as one or more computers”]).  
“and performing, at the computation unit, the computations based on data values derived from the instructions received from the external controller.”(Young teaches control signals being received from an external controller that the matrix computation unit of the neural network processing system uses to perform the computations. [Young col 7 lines 18-20: “In some other implementations, the host interface 302 passes in a control signal from an external processor.”, col 13 lines 25-26: “The matrix computation unit 312 performs computations based on the control signals”, col 10 lines 36-45: “That is, the control signals 610 can regulate whether the activation values are pooled, where the activation values are stored, e.g., in the unified buffer 308, or can otherwise regulate handling of the activation values. The control signals 610 can also specify the activation or pooling functions, as well as other parameters for processing the activation values or pooling values, e.g., a stride value.”]).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Yin with the teachings of Young for at least the same reasons as discussed above in claim 13.

Regarding Claim 25: 
Yin teaches a hybrid neural network processor (called “Thinker”) that has configurable processing elements, processing element arrays, and memory banking. Yin teaches: 
“providing, by a processing device of a hardware circuit, programming data for performing the computations for the neural network;” (Yin teaches a CPU that provides data signals (referred by Yin as ‘input data’) and programming data (referred by Yin as ‘configuration words’). A CPU is a processing device of a hardware circuit. [Yin page 976 sec IV. D. ¶2: “Thinker processor usually works as a coprocessor of primary CPU. Before the execution of a hybrid-NN, CPU loads the array and layer parameters into the Thinker’s parameter buffer, and PE configuration words into configuration buffer. During runtime, CPU sends the input data and weights to Thinker’s data buffers.”]).
“receiving, by a core of the hardware circuit that communicates with the processing device, the programming data provided by the processing device,” (Yin teaches the Thinker processor receiving configuration words (i.e. programming data) from the CPU. The Thinker would comprise of at least a core due to it being a processor. The Thinker is in communication with the CPU due to it being a coprocessor of the CPU. [Yin page 976 sec IV. D. ¶2: “Thinker processor usually works as a coprocessor of primary CPU. Before the execution of a hybrid-NN, CPU loads the array and layer parameters into the Thinker’s parameter buffer, and PE configuration words into configuration buffer. During runtime, CPU sends the input data and weights to Thinker’s data buffers.”]).
“wherein the core comprises an activation memory configured to store sets of layer inputs”(Yin teaches an activation memory (referred to as the data buffers by Yin) storing input data to the arrays which are used for computations for the  neural network layers. [Yin Page 970-971 sec. III A. ¶3: “2) On-Chip Memory System: Two 144-KB multi-bank SRAM data buffers (Data Buffer_1 and Data Buffer_2) store intermediate data between NN layers […] During the computation of a specific layer, one sub-buffer provides input data to the arrays and the other stores the output data from the arrays.”, page 973 sec. III. B. ¶7: “Thinker’s multi-bank on-chip buffers, distributed IO ports, and independent computation flows of different operations enable the PE array to be flexibly partitioned into four sub-arrays to process CONV, FC, pooling, and RNN-gating operations, respectively. Fig. 8(a) illustrates processing LRCN in AP-flow. CONV array is assigned with 15 × 13 general PEs; 15 × 3 general PEs are assigned for FC array.”, Table I: 
    PNG
    media_image1.png
    376
    246
    media_image1.png
    Greyscale
]).   
“and a parameter memory configured to store parameters for a first neural network layer; (Yin teaches a weight buffer for storing weights for the neural network layers. Weight buffer is the parameter memory. [Yin page 971 sec. III. A: “The 1-KB weight buffer is used to store the weights loaded from external memory and provides weights to PE arrays.”, page 973 sec. III. B. ¶7: “Thinker’s multi-bank on-chip buffers, distributed IO ports, and independent computation flows of different operations enable the PE array to be flexibly partitioned into four sub-arrays to process CONV, FC, pooling, and RNN-gating operations, respectively. Fig. 8(a) illustrates processing LRCN in AP-flow. CONV array is assigned with 15 × 13 general PEs; 15 × 3 general PEs are assigned for FC array.”, Table I: 
    PNG
    media_image1.png
    376
    246
    media_image1.png
    Greyscale
]). 
“receiving, by a computation unit of the core, an input of the sets of layer inputs accessed by the rotation unit, the input being received for processing at the first neural network layer;” (Yin teaches PE arrays that are the main computing units. As seen in Fig. 2, the PE arrays are comprised of PEs. Yin teaches the computing cells (i.e. the PEs) receiving input feature points that were accessed in turn by the rotation unit. [Yin page 970 sec. III. A. ¶1: “Fig. 2 shows the top-level architecture of the proposed hybrid-NN processor. Two 16×16 heterogeneous PE arrays are the main computing units. Each array can be partitioned into sub-arrays for different functions, and PEs can be configured to execute bit-width adaptive operations.”, Fig. 2: 
    PNG
    media_image2.png
    430
    601
    media_image2.png
    Greyscale
, page 970 sec. III. A. ¶2: “The input feature points are loaded to PEs at the left/right edge, and are horizontally shifted to the PEs inside the array”, page 970-971 sec. III. A. ¶3: “During the computation of a specific layer, one sub-buffer provides input data to the arrays”, page 970 sec III. A: “2) On-Chip Memory System: Two 144-KB multi-bank SRAM data buffers (Data Buffer_1 and Data Buffer_2) store intermediate data between NN layers”]).  
“receiving, by the computation unit, a parameter for the first neural network layer;” (Yin teaches the PEs receiving weights which are parameters and the PEs are the computing cells for the neural network layers. [Page 971 sec. III. A. ¶4: “The 1-KB weight buffer is used to store the weights loaded from external memory and provides weights to PE arrays. In each PE array, a 16-KB local buffer exploits weight reuse, which provides 16 weights for 16 PE columns in parallel.”, page 973 sec. III. B. ¶7: “Thinker’s multi-bank on-chip buffers, distributed IO ports, and independent computation flows of different operations enable the PE array to be flexibly partitioned into four sub-arrays to process CONV, FC, pooling, and RNN-gating operations, respectively. Fig. 8(a) illustrates processing LRCN in AP-flow. CONV array is assigned with 15 × 13 general PEs; 15 × 3 general PEs are assigned for FC array.”, Table I: 
    PNG
    media_image1.png
    376
    246
    media_image1.png
    Greyscale
]).
generating, by the computation unit, an output of the first neural network layer using the input accessed by the rotation unit and the parameter;” (As can be seen in Fig. 6(b), Yin teaches an output for a neural network layer generated based on weights (i.e. parameters) and input data. [page 973 sec. III. B. ¶7: “Thinker’s multi-bank on-chip buffers, distributed IO ports, and independent computation flows of different operations enable the PE array to be flexibly partitioned into four sub-arrays to process CONV, FC, pooling, and RNN-gating operations, respectively. Fig. 8(a) illustrates processing LRCN in AP-flow. CONV array is assigned with 15 × 13 general PEs; 15 × 3 general PEs are assigned for FC array.”, page 972 sec. III. B. ¶1: “In PE array, the computation of output points are fixed on the respective PEs, reusing weights, and input feature points in the vertical and horizontal directions, respectively (also known as output stationary dataflow).”, Fig. 6(b): 
    PNG
    media_image3.png
    380
    597
    media_image3.png
    Greyscale
]).  
“and storing, using a crossbar unit of the core, the output of the first neural network layer in the activation memory in accordance with a bank assignment pattern that is based on the programming data and an attribute value assigned to a second neural network layer.” (Yin teaches the output of a layer being stored onto a data buffer (i.e. the activation memory) in accordance to a rule (i.e. a bank assignment pattern) based on the programming data and attribute values wherein the attribute values taught by Yin are stride values. [Yin page 970-971 sec. III. A. ¶3: “In order to support array partitioning (AP), each data buffer is divided into two sub-buffers for CONV and FC sub-arrays. During the computation of a specific layer, one sub-buffer provides input data to the arrays and the other stores the output data from the arrays.”, page 975 sec IV. B. ¶5: “Therefore, the input feature point p should be mapped to the virtual bank number B(p) = rmodBm. In ConvNet, the output map of each CONV layer is written back to multibank on-chip buffer according to the same rule and serves as an input map of the next layer.”]). 
“One or more non-transitory machine-readable storage devices for storing instructions that are executable by one or more processing devices to cause performance of operations comprising:” or “accessing, by a rotation unit of the core, the sets of layer inputs stored at the activation memory, wherein the rotation unit rotates accessing the sets of layer inputs based on the programming data received by the core;” 
Young teaches a method for performing kernel striding for convolutional neural network layers on a hardware circuit (Young Abstract: “Methods for receiving a request to process, on a hardware circuit, a neural network comprising a first convolutional neural network layer having a stride greater than one […]”). Young teaches:
“One or more non-transitory machine-readable storage devices for storing instructions that are executable by one or more processing devices to cause performance of operations comprising:” (Young teaches a neural network processing system being implemented through program instructions stored on computer readable media that is executable by one or more processors. [Young col 16 lines 62-67: “Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non transitory program carrier for execution by, or to control the operation of data processing apparatus.”, col 18 lines 3-5: “Computer readable media suitable for storing computer program instructions and data include all forms of nonvolatile memory, media and memory devices”, col 17 lines 10-13: “The term “data processing apparatus' encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers.”]).  
“accessing, by a rotation unit of the core, the sets of layer inputs stored at the activation memory, wherein the rotation unit rotates accessing the sets of layer inputs based on the programming data received by the core;” (Examiner notes that “rotate accessing the sets of layer inputs” is not defined. In light of page 35 lines 5-15 of the instant specifications, rotate accessing the sets of layer inputs is interpreted as shifting input data that is then provided to the computing cell of the computation unit. Young teaches shifting the activation input (otherwise known as the inputs to a neural network layer. [Young Col 8 lines 30-37: “a host interface, e.g., the host interface 302 of FIG. 3, shifts activation inputs throughout the array 406 along one dimension, e.g., to the right, while shifting weight inputs throughout the array 406 along another dimension, e.g., to the bottom. For example, over one clock cycle, the activation input at cell 414 can shift to an activation register in cell 416, which is to the right of cell 414.”, col 3 lines 64-67: “Data inputs to a neural network layer, e.g., either the input to the neural network or the outputs of the layer below the layer in the sequence, to a neural network layer can be referred to as activation inputs to the layer.”]). 
Yin, Young, and the instant application are analogous art because they are all directed to special purpose hardware for neural network computations.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the neural network processor disclosed by Yin to include “One or more non-transitory machine-readable storage devices for storing instructions that are executable by one or more processing devices to cause performance of operations comprising:” or “accessing, by a rotation unit of the core, the sets of layer inputs stored at the activation memory, wherein the rotation unit rotates accessing the sets of layer inputs based on the programming data received by the core” as taught by Young. One would be motivated to do so to avoid processing delays, making neural network computations more efficient, as suggested by Young (Young col 3 lines 7-13: “This allows for an inference of a neural network that includes a convolutional layer having a stride greater than one to be .

Prior Art of Record
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Narayanaswami et al. (US20180197068A1) teaches architecture for neural network computations involving tensor computations with multiply accumulate operators and computing memory address locations (¶28: “FIG. 1 shows a block diagram of an example computing system 100 for traversing one or more tensors to perform computations for a neural network layer. As shown, computing system 100 includes a processing unit 102, a storage medium 104, tensor traversal unit (TTU) 106, a multiply accumulate (MAC) operator 108, and an activation unit 110.”, ¶24: “Traversing the tensor in a nested loop requires a computation of a memory address value of an element to load or store the corresponding data value of the element.”). Delerse et al. (US20200184328A1) teaches a system for accelerating neural network computations by skipping input values and configuration of accessing input data (¶11: “According to one example embodiments, a system for accelerating ANN computation is provided. The system may include a controller, a selector communicatively coupled to the controller, and an arithmetic unit communicatively coupled to the controller and the selector.”, ¶13: “The selector may also include an information related to a memory address.”). 

Conclusion	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Somie Park whose telephone number is (571)272-1056. The examiner can normally be reached 9:00am - 5:00pm, Monday-Friday.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571)272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SOMIE PARK/Examiner, Art Unit 2126  
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126