Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
	This is a second non-final rejection, because a new ground of rejections made below was not necessitated by applicant’s amendments.  This Office Action is responsive to Applicant’s Amendment filed April 14, 2022 in which claims 12, 13, 15, 19, and 20 are amended.  Claims 1-26 are currently pending. 
	Applicant is reminded of the duty to disclose information material to patentability [MPEP 2001].

Information Disclosure Statement
The information disclosure statement (IDS) submitted on April 14, 2022 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Arguments
	The objections to claims 7, 15, and 20 are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
The rejections to claims 12, 13, and 19-23 under 35 U.S.C. § 112(b) are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
Applicant’s arguments with respect to rejection of claims 1-26 under 35 U.S.C. 103 based on amendment have been considered and are persuasive. The argument is moot in view of a new ground of rejection set forth below.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-3 and 20-25 are rejected under 35 U.S.C. 102 as being anticipated by Peemen (“Memory-Centric Accelerator Design for Convolutional Neural Networks”, 2013).

	Regarding claim 1, Peemen teaches  A method comprising: reading a neural network description describing a plurality of neural network layers, ([p. 14 §IIIA] "In these layers simple features such as edges are extracted, which are combined in the next layers to detect more complex features such as corners or crossings. Secondly, the features, represented as feature maps, are classified by feed forward neural network layers")
	each of the plurality of neural network layers having an associated weight tensor, input tensor, and output tensor; ([p. 15 §IIIA] "In general, a layer in the network converts Q input images X1...XQ to R output images Y1...YR. For example, in layer 1 the input image is filtered 6 times by (1), each time with a different weight set.")
	determining a plurality of precedence relationships among the plurality of neural network layers; ([p. 16 §IV B] “Multiple parameters of a CNN are programmed in the execution schedule, which can change during execution. Examples are: convolution kernel size (loops k and l), feature map size (loops m and n), number of input feature maps (loop q), and subsample size. These parameters ensure that the accelerator supports a variety of layer configurations." Loops k and l for kernel size interpreted as an example of a precedence relationship for the execution schedule and explicitly taught as supporting a variety of layer configurations among the plurality of neural network layers.)
	based on the plurality of precedence relationships, generating a sequence of the plurality of neural network layers; ([p. 16 §IV B] “Multiple parameters of a CNN are programmed in the execution schedule...These parameters ensure that the accelerator supports a variety of layer configurations” [p. 17 §V] “This sequence is repeated until a CNN layer is complete and a new parameter set must be communicated”)
	mapping the weight tensor, input tensor, and output tensor of each of the plurality of neural network layers onto an array of neural cores. ([p. 15 §IV] "Each PE sequentially computes a neuron value in a feature map. Hence, iterations of loop m and n are divided over different PEs, and iterations of loop k and l are performed by one PE" [p. 17 §V 4] "The proposed schedules are implemented as a pipelined communication stream. A control part of the schedule runs on a MicroBlaze host processor, it communicates the required data for a tile to the accelerator" Tile interpreted as synonymous with core.  See FIG. 3 and FIG. 9 which shows mapping of input, output, and weights to accelerator). 

	Regarding claim 2, Peemen teaches The method of claim 1, wherein the mapping comprises determining an execution schedule comprising a plurality of operations for the array of neural cores, the execution schedule guaranteeing data delivery at each of the plurality of neural cores for the computation of each neural network layer. ([p. 16 §V] "If reuse buffers have enough content the MACC PEs can continue processing. Because bandwidth towards the buffers is relatively small, reuse of data is exploited to increase bandwidth for reading from the buffers. Our design flow selects the best computation schedules to maximize data reuse
for a buffer size restriction." Maximizing data reuse for a buffer size restriction interpreted as synonymous with guaranteeing data delivery.  MACC PE interpreted as synonymous with neural core.). 

	Regarding claim 3, Peemen teaches The method of claim 2, wherein the execution schedule comprises computation and communication operations. ([p. 17 §V2] "Memory effects of different schedules:... we model the communications per iteration, defined as: the number loads divided by the compute iterations. As shown, a reduction of communications is achieved when successive tiles reuse their overlapping values. This reuse is illustrated with a purple box, and the remaining communication by an orange box. Figure 8(b) shows a better schedule regarding data reuse by maximizing the overlap between successive tiles."). 

Regarding claim 20,  Peemen teaches The method of claim 1, wherein the mapping comprises determining a network schedule, the network schedule determining the timing of communications on one or more network. ([p. 16 §VIA] "2) Optimized schedules: Figure 14 shows the execution time of mappings generated with locality optimized schedules. Each line in the graph represents a column of Table I. The scaling in execution times shows that more buffer size is required when the number of PEs is increased...If data transfer time is smaller than compute time, total execution time is close to the theoretical optimum" Transfer time interpreted as synonymous with communication timing or broadcast timing which is taught as a portion of the execution time.). 

	Regarding claim 21, Peemen teaches The method of claim 20, wherein the one or more network comprises a weight network, an instruction network, an activation network, or a partial sum network. ([p. 16 §IVA] "Because weight values are shared in a feature map, these are broadcasted to the PEs. The small set of values in the weight BRAM is reused for the other neurons in the feature map"). 

	Regarding claim 22, Peemen teaches The method of claim 21, wherein the network schedule precludes simultaneous transmission on each network. ([p. 18 §VI] "The scaling in execution times shows that more buffer size is required when the number of PEs is increased. Since this reduces the compute time, it becomes impossible to overlap data transfer, and as a result data transfer becomes dominant" Precluding simultaneous transmission is interpreted as synonymous with becoming impossible to overlap data transfer.). 

	Regarding claim 23, Peemen teaches The method of claim 20, wherein the network schedule guarantees data delivery at each of the plurality of neural cores for each computation using said data	([p. 16 §V] "If reuse buffers have enough content the MACC PEs can continue processing. Because bandwidth towards the buffers is relatively small, reuse of data is exploited to increase bandwidth for reading from the buffers. Our design flow selects the best computation schedules to maximize data reuse for a buffer size restriction." Maximizing data reuse for a buffer size restriction interpreted as synonymous with guaranteeing data delivery.  MACC PE interpreted as synonymous with neural core.). 

	Regarding claim 24, Peemen teaches The method of claim 1, wherein the mapping comprises determining a batch size. ([p. 16 §IV] "4) Column based storage with subsampling: Due to the flexibility of the reuse buffers subsampling factors are directly supported. If (1) contains a subsample factor S > 1 the parallel feature map neurons are not direct neighbors. As a result a different pattern must be send to the PEs"). 

	Regarding claim 25, Peemen teaches The method of claim 24, wherein the batch size is unique for at least one neural network layer. ([p. 14 §III] "Since classification layers do not use subsampling, the factor S is 1 for layer 3 and 4"). 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Peemen and in view of Baum (US20180285727A1). 

	Regarding claim 4, Peemen teaches The method of claim 3.
	However, Peemen does not explicitly teach generating microcode, 
	the microcode executable by a chip to compute the plurality of neural network layers, the chip comprising the array of neural cores.  

Baum teaches The method of claim 3, further comprising: generating microcode, ([¶0116] " It comprises a dataflow machine or processor incorporating, in one embodiment, microcode tailored for neural network operations."   [¶0152] "Note that this is an example implementation only as other sequences may be used by loading different microcode to the layer controllers" Loading microcode interpreted as synonymous with generating microcode.).
	the microcode executable by a chip to compute the plurality of neural network layers, the chip comprising the array of neural cores. ([¶0117] "In addition, the NN processor includes layer controllers incorporating microcode machines that allow full accessibility to the control signaling of the computational elements, memory etc.").  

Peemen and Baum are both directed towards accelerating neural networks.  Therefore, Peemen and Baum are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Peemen with the teachings of Baum by generating and using microcode in the accelerator.  Baum explicitly teaches the benefits of usage of microcode in the accelerator ([¶0117] “the NN processor includes layer controllers incorporating microcode machines that allow full accessibility to the control signaling of the computational elements, memory etc”) as well as provides additional motivation for combination ([¶0218] “An additional capability of the microcode machine in the LCs is that there are no conditional statements or conditional branching. This is advantageous for data pipelining since the need to manage branch prediction or other pipeline overhead is avoided.”).  

	Claims 5 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Peemen, and Baum and in further view of Goyal (US20170316312A1).

	Regarding claim 5, the combination of Peemen and Baum teaches The method of claim 4.
	While Baum explicitly teaches generating core microcode, the combination of Peemen and Baum does not explicitly teach generating core microcode, the core microcode executable by the plurality of physical cores to generate partial sums.  

Goyal, in the same field of endeavor, teaches generating core microcode, the core microcode executable by the plurality of physical cores to generate partial sums. ([¶0020] "the DLP 102 further includes at least a plurality of tensor engines (TEs) 104, which are dedicated hardware blocks/components each including one or more microprocessors and on-chip memory units storing software instructions programmed by a user for various machine learning operations." [¶0026] "Each time a block of weights is read from the weight matrix, they are multiplied element-wise with the block of the vector, summed, and added by the MatrixMul engine 408 as a partial sum to the corresponding output value"). 

	Peemen, Baum, and Goyal are all directed towards accelerating neural networks. Therefore, emen, Baum, and Goyal are all analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Peemen and Baum with the teachings of Goyal by generating partial sums. Partial sums are well known in the art for parallel neural network accelerators, which is reinforced with Goyal.  One advantage to the partial sum method taught in Goyal is that the memory is only accessed once ([¶0026] “During the entire process, the weight matrix is read N/T times and the input matrix is read K/T times while the output matrix is written/stored only once to the memory.”). Goyal teaches as further motivation for combination that ([¶0003] “the key objective of the inference phase is to achieve energy (e.g., performance per watt) and capital (ROI) efficiency”).  

	Regarding claim 6,  the combination of Peemen and Baum teaches The method of claim 4.
	However, the combination of Peemen and Baum does not explicitly teach generating chip microcode, the chip microcode executable by at least one chip microengine to distribute weights, parameters, instructions, and/or activation data to the array of neural cores.  

Goyal, in the same field of endeavor, teaches The method of claim 4, further comprising: generating chip microcode, the chip microcode executable by at least one chip microengine to distribute weights, parameters, instructions, and/or activation data to the array of neural cores. ([¶0021] "During its operation, the DLP 102 is configured to accept instructions from a host 103 and submit the instructions to the tensor engines 104 and their respective components in the DLP 102 via a DLP interface 112").  

	Claims 7-15 and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Peemen and in view of Goyal. 

	Regarding claim 7, Peemen teaches The method of claim 1.
	While Peemen implicitly teaches that the memory allocation includes the weight tensor, input tensor, and output tensor, Peemen does not explicitly teach the mapping comprises determining a memory allocation for the weight tensor, input tensor, and output tensor of each of the plurality of neural network layers.  

Goyal teaches The method of claim 1, wherein the mapping comprises determining a memory allocation for the weight tensor, input tensor, and output tensor of each of the plurality of neural network layers ([¶0018] "DLP is fully programmable using existing tools and workflows and it achieves high performance and high energy efficiency with balanced allocation of computing and memory resources." [¶026] "During the entire process, the weight matrix is read N/T times and the input matrix is read K/T times while the output matrix is written/stored only once to the memory."). 

	Peemen and Goyal are both directed towards accelerating neural networks. Therefore, Pemen and Goyal are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Peemen with the teachings of Goyal by generating partial sums and mapping the input, output, and weight tensors to memory. Partial sums are well known in the art for parallel neural network accelerators, which is reinforced with Goyal.  One advantage to the partial sum method taught in Goyal is that the memory is only accessed once ([¶0026] “During the entire process, the weight matrix is read N/T times and the input matrix is read K/T times while the output matrix is written/stored only once to the memory.”). Goyal teaches as further motivation for combination that ([¶0003] “the key objective of the inference phase is to achieve energy (e.g., performance per watt) and capital (ROI) efficiency”).  The motivation for combination for claim 7 also applies to the remaining claims depending on claim 7.

	Regarding claim 8, the combination of Peemen, and Goyal teaches The method of claim 7, wherein the mapping comprises: determining a memory allocation for plurality of partial sums of each of the plurality of neural network layers. (Goyal [¶0026] " Each time a block of weights is read from the weight matrix, they are multiplied element-wise with the block of the vector, summed, and added by the MatrixMul engine 408 as a partial sum to the corresponding output value" adding partial sum to the corresponding output value is interpreted as synonymous with determining a memory allocation for a partial sum.). 

	Regarding claim 9, the combination of Peemen, and Goyal teaches 	The method of claim 7, wherein the mapping comprises: determining a memory allocation for the weight tensor in a global memory. (Peemen [p. 16 §IVA] "Because weight values are shared in a feature map, these are broadcasted to the PEs. The small set of values in the weight BRAM is reused for the other neurons in the feature map." The weight BRAM is a predetermined global memory allocation for the weight tensor.). 

	Regarding claim 10, the combination of Peemen, and Goyal teaches The method of claim 9, wherein the mapping comprises: determining a plurality of mapping functions of the weight tensor to local memories of the neural cores. (Goyal  [¶0026] "The weight matrix W of N1×N2 is stored in column major form" [¶0027] " In some embodiments, separate Compressed Sparse Row (CSR) or Compressed Sparse Column (CSC) Format can be adopted for the corresponding portion of the large matrix distributed to each of the tensor engines" mapping function is interpreted as synonymous with map.  With respect to the instant specification column major and row major are taught as examples of mapping functions.). 

	Regarding claim 11, the combination of Peemen, and Goyal teaches The method of claim 7, wherein the mapping comprises: determining a memory allocation for the input tensor and output tensor in local memories of the neural cores. (Goyal [¶0022] "FIG. 2 depicts an example of a neural network, which includes a plurality of layers, e.g., an input layer, an output layer and multiple hidden layers between them" [¶0025] "Each tensor engine 104 further includes at least four types of hardware engines for accelerated computation on data at each layer of the neural network...the tensor engine 104 includes a fully programmable CPU 402 having its own instruction RAM/cache 404 and data RAM/cache 406 configured to store instructions from the host 103 and retrieved data from the OSM 106" Goyal explicitly teaches that each tensor is capable of performing calculations for each layer, and similarly teaches input and output tensors, and local memory in the tensor engine.). 

	Regarding claim 12, the combination of Peemen, and Goyal teaches The method of claim 11, wherein the memory allocation minimizes the transfer of activations among cores. (Goyal [¶0027] "In some embodiments, the MatrixMul engine 408 in each tensor engine 104 is configured to achieve efficient vector-matrix multiplication by minimizing or avoiding data movement for multiplication between a sparse vector and a dense or sparse matrix, wherein only data that corresponds to non-zero values in the sparse vector is loaded into memory 406"[¶0030] "Once data is loaded into memory 406 of each tensor engine 104, the tensor engine 104 is configured to reuse data in memory across one or more ConvNet engines 410 efficiently to avoid or minimize data movement" activations interpreted as synonymous with input.  Loading non-zero values of sparse matrices is interpreted as input.). 

	Regarding claim 13, the combination of Peemen, and Goyal teaches The method of claim 11, wherein determining the memory allocation comprises resequencing the plurality of neural network layers to conform to a local memory limit. (Peemen [p. 17 §V] "The table present the optimal schedules that minimize external accesses for layer 3 of the speed sign recognition CNN. Each column represents a schedule that corresponds to a memory size constraint"). 

	Regarding claim 14, the combination of Peemen, and Goyal teaches The method of claim 11, wherein the memory allocation is associated with an execution schedule (Peemen [p. 18 §VI] "For evaluation of our method we analyze the effect of different schedules on the external memory accesses" Memory allocation interpreted as synonymous with memory access.) 
	such that the memory allocation is applied when the execution requires the input tensor and output tensor. (Peemen [p. 16 §IVA2] "The number of PEs and BRAM banks can be increased to exploit more parallelism. For large numbers the rotate logic would become complex. As a result, parallelism is exploited over output feature maps that share input feature maps...This configuration of the template is depicted in Figure 5. The number of weight BRAMs increases because each output feature map requires a different weight kernel. Since input feature map values are shared, these are broadcasted to both groups of PEs."). 

	Regarding claim 15, the combination of Peemen, and Goyal teaches The method of claim 14, wherein the local memories are reallocated repeatedly according to the executions schedule	 (Peemen  [p. 15 §IVA] "The memory subsystem facilitates the flexibility and increases communication bandwidth by exploiting data reuse in memory access pattern" Peeman teaches allocating memory and reusing portions of memory in order to minimize memory reallocation.  Therefore repeatedly reallocating memory would lead to expected outcomes.). 

	Regarding claim 26, Peemen teaches A system comprising: a chip comprising a controller and a plurality of physical neural cores; ([p. 15 §III] "FIG. 3 CNN accelerator template connected to a host processor for control." Processing elements interpreted as synonymous with neural cores.)
	a computing node comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor of the computing node to cause the processor to perform a method comprising: ([p. 14 §II] "the convolution operations performed by Single Instruction Multiple Data (SIMD) type of Processing Elements (PEs)")
	reading a neural network description describing a plurality of neural network layers, ([p. 14 §IIIA] "In these layers simple features such as edges are extracted, which are combined in the next layers to detect more complex features such as corners or crossings. Secondly, the features, represented as feature maps, are classified by feed forward neural network layers")
	each of the plurality of neural network layers having an associated weight tensor, input tensor, and output tensor; ([p. 15 §IIIA] "In general, a layer in the network converts Q input images X1...XQ to R output images Y1...YR. For example, in layer 1 the input image is filtered 6 times by (1), each time with a different weight set.").
	determining a plurality of precedence relationships among the plurality of neural network layers; ([p. 16 §IV B] Multiple parameters of a CNN are programmed in the execution schedule, which can change during execution. Examples are: convolution kernel size (loops k and l), feature map size (loops m and n), number of input feature maps (loop q), and subsample size. These parameters ensure that the accelerator supports a variety of layer configurations." Loops k and l for kernel size interpreted as an example of a precedence relationship explicitly taught as supporting a variety of layer configurations among the plurality of neural network layers.)
	based on the plurality of precedence relationships, generating a sequence of the plurality of neural network layers; ([p. 16 §IV B] “Multiple parameters of a CNN are programmed in the execution schedule...These parameters ensure that the accelerator supports a variety of layer configurations” [p. 17 §V] “This sequence is repeated until a CNN layer is complete and a new parameter set must be communicated”)
mapping the weight tensor, input tensor, and output tensor of each of the plurality of neural network layers onto an array of neural cores ( [p. 15 §IV] "Each PE sequentially computes a neuron value in a feature map. Hence, iterations of loop m and n are divided over different PEs, and iterations of loop k and l are performed by one PE" [p. 17 §V 4] "The proposed schedules are implemented as a pipelined communication stream. A control part of the schedule runs on a MicroBlaze host processor, it communicates the required data for a tile to the accelerator" Tile interpreted as synonymous with core.  See FIG. 3 and FIG. 9 which shows mapping of input, output, and weights to accelerator).
	However, Peemen does not explicitly teach generating microcode; distributing the microcode to the chip, wherein: the chip is adapted to execute the microcode to compute the plurality of neural network layers.  

Goyal, in the same field of endeavor, teaches generating microcode; distributing the microcode to the chip, wherein: the chip is adapted to execute the microcode to compute the plurality of neural network layers. ([¶0021] "During its operation, the DLP 102 is configured to accept instructions from a host 103 and submit the instructions to the tensor engines 104 and their respective components in the DLP 102 via a DLP interface 112"). 

	Peemen and Goyal are both directed towards accelerating neural networks. Therefore, Pemen and Goyal are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Peemen with the teachings of Goyal by generating partial sums and mapping the input, output, and weight tensors to memory. Partial sums are well known in the art for parallel neural network accelerators, which is reinforced with Goyal.  One advantage to the partial sum method taught in Goyal is that the memory is only accessed once ([¶0026] “During the entire process, the weight matrix is read N/T times and the input matrix is read K/T times while the output matrix is written/stored only once to the memory.”). Goyal teaches as further motivation for combination that ([¶0003] “the key objective of the inference phase is to achieve energy (e.g., performance per watt) and capital (ROI) efficiency”).  

	Claims 16-19 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Peemen, and Goyal and in further view of Sarma (US20190235866A1).

	Regarding claim 16, Peemen teaches The method of claim 1, wherein each layer of the neural network has an associated tensor operation, the method further comprising: selecting a parametrized scheme from a library of parametrized schemes, the selected scheme corresponding to the tensor operation (Goyal [¶0022] " As shown by the example of FIG. 2, there are three stages in the processing pipeline for each layer of a fully connected (FC) neural network—multiplication of neuron inputs Xi of a layer with weights Wij, addition of multiplication results and bias vector Bj, and application of an activation function to produce an output Yj to the next layer" Activation function interpreted as synonymous with layer associated tensor operation.).
	However, Peemen does not explicitly teach the mapping comprises instantiating the selected scheme with parameters corresponding to the tensor operation.  

Sarma, in the same field of endeavor, teaches selecting a parametrized scheme from a library of parametrized schemes, the selected scheme corresponding to the tensor operation, wherein: ([¶0026] "In various embodiments, each computation unit is configured to perform one or more multiply, add, accumulate, and/or shift operations. In some embodiments, each computation unit is configured to perform a dot-product operation." scheme interpreted as synonymous with instruction set.  Library of parametrized schemes interpreted as synonymous with instruction set architecture. Vector and matrix are both interpreted as forms of tensors. Dot-product operation interpreted as synonymous with tensor operation.)
	the mapping comprises instantiating the selected scheme with parameters corresponding to the tensor operation. ([¶0026] "For example... A two-dimensional data set, such as an image, may be formatted and fed into matrix processor 107 using data input 103, one vector at a time. In parallel, a vector of weights may be applied to the two-dimensional data set by formatting the weights and feeding them as a vector into matrix processor 107 using weight input 105. Corresponding computation units of matrix processor 107 perform a matrix processor instruction on the corresponding operands of the weight and data inputs in parallel." vector/matrix instructions interpreted as synonymous with tensor operations. Sarma teaches instantiating the selected scheme with 2D image data for the tensor operation.). 

	Peemen, Goyal, and Sarma are all directed towards accelerating machine learning systems.  Therefore, Peemen, Goyal, and Sarma are all analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Peemen and Goyal with the teachings of Sarma by implementing a parameter scheme library into the accelerator.  Sarma teaches as a motivation for combination ([¶0002] “Machine learning and artificial intelligence operations often rely on the repeated application of a set of specific machine learning processor operations over very large datasets. Therefore, there exists a need for a microprocessor system that supports performing machine learning and artificial intelligence specific processing operations on large datasets in parallel without the overhead of multiple processing cores for each parallel operation.”). 

	Regarding claim 17, the combination of Peemen, Goyal, and Sarma teaches 
	The method of claim 16, wherein the mapping comprises determining an execution schedule comprising a plurality of operations for the array of neural cores, (Peemen [p. 17 §V2] "Memory effects of different schedules:... we model the communications per iteration, defined as: the number loads divided by the compute iterations. As shown, a reduction of communications is achieved when successive tiles reuse their overlapping values. This reuse is illustrated with a purple box, and the remaining communication by an orange box. Figure 8(b) shows a better schedule regarding data reuse by maximizing the overlap between successive tiles.")
	the execution schedule guaranteeing data delivery at each of the plurality of neural cores for the computation of each neural network layer. (Peemen [p. 16 §V] "If reuse buffers have enough content the MACC PEs can continue processing. Because bandwidth towards the buffers is relatively small, reuse of data is exploited to increase bandwidth for reading from the buffers. Our design flow selects the best computation schedules to maximize data reuse for a buffer size restriction." Maximizing data reuse for a buffer size restriction interpreted as synonymous with guaranteeing data delivery.  MACC PE interpreted as synonymous with neural core.). 

	Regarding claim 18,  the combination of Peemen, Goyal, and Sarma teaches The method of claim 17, wherein the execution schedule comprises computation and communication operations. (Peemen [p. 17 §V2] "Memory effects of different schedules:... we model the communications per iteration, defined as: the number loads divided by the compute iterations. As shown, a reduction of communications is achieved when successive tiles reuse their overlapping values. This reuse is illustrated with a purple box, and the remaining communication by an orange box. Figure 8(b) shows a better schedule regarding data reuse by maximizing the overlap between successive tiles."). 

Regarding claim 19, the combination of Peemen, Goyal, and Sarma teaches The method of claim 18, wherein the execution schedule further comprises NOPs. (Sarma [¶0071] "For example, no-ops may be inserted into the component instructions of a vector computational unit instruction to allow a load operation to complete before an arithmetic logic unit operation").

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720.  The examiner can normally be reached on M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        

/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126