Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on April 4, 2022, in which claims 3-6, 14, and 15 are amended. Claims 1-24 are currently pending.

Response to Arguments
The rejections to claims 14-15 under 35 U.S.C. § 112(b) are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
Applicant’s arguments with respect to rejection of claims 1-20 under 35 U.S.C. 102/103 based on amendment have been considered, however, are not deemed persuasive. 
Applicant is reminded that cited prior art references must be considered in their entirety and not only the cited sections [ MPEP 2141.02(VI) ].
With respect to Applicant's arguments that Goyal does not teach or disclose partitioning a plurality of cores into a plurality of partitions based on dimensions of the layer and vector units, Examiner respectfully disagrees.  Partitioning 'based on' dimensions is a broad relative description, such that Examiner suggests maintaining the original rejection to be a reasonable interpretation.  Goyal explicitly teaches that the processing is performed on a sequential layer-wise basis as would be obvious to one of ordinary skill in the art ([¶0022] “each layer has a plurality of neurons connecting to neurons on a neighboring layer with information/data processed progressing from one layer to next in sequence along a processing pipeline. As shown by the example of FIG. 2, there are three stages in the processing pipeline for each layer of a fully connected (FC) neural network”).  Goyal further explicitly teaches that the layers of the matrix are represented as matrices and individually addressed by the ConvNet ([¶0029] "In the example of FIG. 4, the ConvNet engine 410 in each tensor engine 104 is configured to explore sparsity of the vectors and/or matrices across the spectrum of various convolution layers of the neural network for efficient convolution").  As mentioned in the office action, Goyal then explicitly teaches that said layer matrices are partitioned based on the size (dimensions) of the matrix (layer) [¶0027].  Examiner asserts that it would be inconceivable how Goyal could partition the layer matrix to a plurality of processor cores without basing the partitions on the dimensions of the matrix, as the stated intent of the partitioning is to fit the matrix on a variety of processor cores with limited capacity.  Similarly, it would be inconceivable how Goyal could partition the matrix to maximize utilization of the available processor cores as stated without basing the partitioning off the number of cores.  For example, if Goyal did not base the partitioning off the number of cores, it would be obvious to one of ordinary skill in the art that there could be more partitions than is processor cores.  Regardless of this obviousness, Goyal explicitly states that each processor core corresponds to a particular matrix subsection ([¶0027] "The MatrixMul engine 408 of each tensor engine 104 is then configured to perform a matrix-matrix multiplication on its corresponding portion of the partitioned matrix").  Goyal also teaches that the input layer data may be an image and explicitly teaches the image being partitioned onto each of the cores ([¶0024] “For a non-limiting example, a large size image can be broken into a plurality of smaller image portions, wherein the size of each of the image portions matches with the input data width of one tensor engine 104 and is handled by each tensor engine 104.”) which is described again as being a layer-wise process ([¶0030] “FIG. 7A depicts an example of kernel reuse, wherein a same kernel is kept and repeatedly applied by the ConvNet engines 410 on different parts of the data (e.g., image) at each convolution layer”).  Further, with regards to the mapping of the vector units, Goyal teaches that each tensor engine has a predetermined number of vectorFPU’s which are used to perform the tensor arithmetic, such that partitioning a matrix based on the tensor engine is synonymous with determining a partitioning based on the vectorFPU’s.  
With regards to Applicant's arguments that Goyal does not teach partitioning based on both a spatial dimension and a feature dimension, Examiner respectfully disagrees.  Partitioning 'based on' dimensions is a broad relative description, such that Examiner suggests maintaining the original rejection to be a reasonable interpretation.  As shown earlier, Goyal explicitly teaches that the partitioning is performed layer-wise.  FIG. 3 of Goyal explicitly shows that each layer is made of a three dimensional matrix, where the third dimension represents the feature dimension.  [¶0023] " FIG. 3 depicts an example of a convolutional neural network for pattern recognition and classification...Here, each kernel is a multi-dimensional (e.g., three- or four-dimension) matrix or template having its own values for elements in the matrix, wherein the dimensions represent (x, y, time) coordinates as well as depth (e.g., color) of the elements of the kernel.".  x and y dimensions are interpreted as synonymous with spatial dimensions as would be well recognized by one of ordinary skill in the art.  Goyal also explicitly teaches that the VectorFPU can handle multidimensional matrices ([¶0025] "one or more vector floating point units (Vector FPUs) 412 each configured to perform floating point vector operations on multiple data segments/vectors per single instruction, and a data engine 414 configured to support prefetching of one or multi-dimensional...data from the OSM).  Although ¶0026 in Goyal in reference to FIG. 5 is explicitly taught as a non-limiting example, examiner would suggest that Applicant's argument that the layer is split up into 2D matrices actually further supports the Examiners interpretation.  Any variation of a subsection of a three dimensional matrix must necessarily be based on all dimensions of the matrix, such that reducing the 3D (x,y,t) kernel to a 2D (x,y) kernel would be synonymous with selecting all layer-wise spatial elements based on a single feature.  
For the reasons disclosed the rejections are maintained.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1-15, 19 and 21-24 are rejected under 35 U.S.C. 102 as being unpatentable over Goyal (US 2017/0316312 A1).

Regarding claim 1, Goyal teaches A system comprising: a neural network model memory adapted to store a neural network model comprising a plurality of layers, each layer having at least one dimension and comprising a plurality of synaptic weights; ([Abstract] "A hardware-based programmable deep learning processor (DLP) is proposed, wherein the DLP comprises with a plurality of accelerators...one or more vector floating point units (VectorFPUs) each configured to perform floating point vector operations, and a data engine configured to retrieve and store multi-dimensional data to both on-chip and external memories." [¶0007] "FIG. 2 depicts an example of a neural network, which includes a plurality of layers in accordance with some embodiments.")
	a plurality of neural cores, each neural core comprising ([Abstract] "Specifically, the DLP includes a plurality of tensor engines configured to perform operations for pattern recognition and classification based on a neural network." tensor engine interpreted as synonymous with neural core.)
	a computation unit, the computation unit adapted to apply a plurality of synaptic weights to a plurality of input activations to produce a plurality of output activations, the computation unit having a plurality of vector units, and ([¶0020] "In the example of FIG. 1, the system 100 includes a hardware-based programmable deep learning processor (DLP) 102, wherein the DLP 102 further includes at least a plurality of tensor engines (TEs) 104, which are dedicated hardware blocks/components each including one or more microprocessors and on-chip memory units storing software instructions programmed by a user for various machine learning operations." [¶0022] "As shown by the example of FIG. 2, there are three stages in the processing pipeline for each layer of a fully connected (FC) neural network—multiplication of neuron inputs Xi of a layer with weights Wij, addition of multiplication results and bias vector Bj, and application of an activation function to produce an output Yj" [¶0023] "For pattern recognition and classification, e.g., image pattern recognition, a convolutional neural network for convolution operations on input data may have three types of layers—one or more convolutional layers, each of which is configured to apply one or more local filters and/or a non-linear activation function to data from the input layer...each of which is configured to perform a linear or multi-layer perceptron (MLP) operation on the FC neural network and apply a non-linear activation function to output from the neuron." DLP interpreted as synonymous with computation unit)
	an activation memory adapted to store the input activations and the output activations; (FIG. 1 106 [¶0020] "The DLP 102 further includes an on-system/on-chip memory (OSM) 106 and one or more deep learning controllers (DLCs) 108 configured to access a plurality of external memory resources (e.g., DRAMs) through multiple input/output channels via memory controller(s)." [¶0024] "Here, each of the plurality of tensor engines 104 is fully programmable and is configured to retrieve and process input data from the OSM 106")
	wherein the system is adapted to partition the plurality of cores into a plurality of partitions based on dimensions of the layer and the vector units. ([¶0024] " In the example of FIG. 1, the DLP 102 adopts a multi-core structure and partitions each neural network processing task for pattern classification among a plurality of tensor engines (TEs) 104" [¶0027] "For scalable matrix-matrix multiplication, the DLP 102 is configured to partition a large dense or sparse matrix into smaller portions and distribute the portions of the matrix across multiple tensor engines 104...The MatrixMul engine 408 of each tensor engine 104 is then configured to perform a matrix-matrix multiplication on its corresponding portion of the partitioned matrix" neural network layer is interpreted as being represented as matrix corresponding to neural network processing task for pattern classification.  Each matrix operation is interpreted as synonymous with layer operation.  Matrix partitioning is explicitly taught as being based on size (or dimensions) of matrix across vector units in each tensor engine. Large is interpreted as referring to the dimensions of the matrix or layer.). 

	Regarding claim 2, Goyal teaches The system of claim 1, further comprising: at least one controller operatively coupled to the neural network model memory and to the plurality of cores, the at least one controller being adapted to, for each layer of the neural network model ([¶0020] "The DLP 102 further includes an on-system/on-chip memory (OSM) 106 and one or more deep learning controllers (DLCs) 108 configured to access a plurality of external memory resources (e.g., DRAMs) through multiple input/output channels via memory controller(s).")
	configure the plurality of cores to implement the layer, and ([¶0027] "For scalable matrix-matrix multiplication, the DLP 102 is configured to partition a large dense or sparse matrix into smaller portions and distribute the portions of the matrix across multiple tensor engines 104...The MatrixMul engine 408 of each tensor engine 104 is then configured to perform a matrix-matrix multiplication on its corresponding portion of the partitioned matrix" configuring the plurality of cores to implement the layer interpreted as synonymous with partitioning cores based on layer size.)
	provide input activations for the layer to the plurality of cores. ([¶0024] "Here, each of the plurality of tensor engines 104 is fully programmable and is configured to retrieve and process input data from the OSM 106 and/or the external memory resources via the DLCs 108," input data and input activations are interpreted as synonymous.  DLC and controller are interpreted as synonymous). 

	Regarding claim 3, Goyal teaches The system of claim 2, further comprising a network on a chip (NoC) coupled to the plurality of cores. ([Abstract] "Specifically, the DLP includes a plurality of tensor engines configured to perform operations for pattern recognition and classification based on a neural network" DLP is interpreted as network on a chip. Tensor engines interpreted as synonymous with cores.). 

	Regarding claim 4, Goyal teaches The system of claim 3, wherein input activations are provided to the plurality of cores via the (NoC). ([¶0023] "For pattern recognition and classification, e.g., image pattern recognition, a convolutional neural network for convolution operations on input data may have three types of layers—one or more convolutional layers, each of which is configured to apply one or more local filters and/or a non-linear activation function to data from the input layer...each of which is configured to perform a linear or multi-layer perceptron (MLP) operation on the FC neural network and apply a non-linear activation function to output from the neuron."). 

	Regarding claim 5, Goyal teaches The system of claim 3, wherein configuring the plurality of cores comprises distributing parameters to the plurality of cores via the (NoC) ([¶0026] " The weight matrix W of N1×N2 is stored in column major form, wherein corresponding weights for the vector are also read once in blocks of size B at a time first from the first column and then from the second column, etc. Each time a block of weights is read from the weight matrix, they are multiplied element-wise with the block of the vector, summed, and added by the MatrixMul engine 408" weights are interpreted as synonymous with parameters.  Storing is interpreted as synonymous with distributing. MatrixMul engine 408 is an aspect of the plurality of cores, see FIG. 4). 

	Regarding claim 6, Goyal teaches The system of claim 5, wherein configuring the plurality of cores further comprises distributing instructions to the plurality of cores via the (NoC). ([¶0018] "In addition, the DLP runs a complete pipeline of deep learning processing/operations offloaded from a host/computing device" [¶0020] "DLP 102 further includes at least a plurality of tensor engines (TEs) 104, which are dedicated hardware blocks/components each including one or more microprocessors and on-chip memory units storing software instructions programmed by a user for various machine learning operations." offloading instructions from host/computing device is interpreted as synonymous with distributing instructions to the DLP where it can be further distributed to the user programmable cores.). 

	Regarding claim 7, Goyal teaches The system of claim 1, wherein the plurality of partitions for each layer is further determined based on spatial dimensions of the input activations for that layer. ([¶0026] "The weight matrix W of N1×N2 is stored in column major form," FIG. 3  [¶0027] "For scalable matrix-matrix multiplication, the DLP 102 is configured to partition a large dense or sparse matrix into smaller portions and distribute the portions of the matrix across multiple tensor engines 104. In some embodiments, separate Compressed Sparse Row (CSR) or Compressed Sparse Column (CSC) Format can be adopted for the corresponding portion of the large matrix distributed to each of the tensor engines 104" the size of the layer is interpreted as being fully based on the spatial dimension and the feature dimensions.  Goyal FIG. 3 shows that each layer having three dimensions: height, width (the spatial dimensions from the input image) and a feature dimension.  Input activation is interpreted as N1 represents layer input. N2 represents layer output.  Goyal explicitly teaches that the distributed matrix depends on the spatial dimensions and feature dimensions.  N1 and N2 are taught as dense matrices that may be explicitly partitioned among tensor engines.). 

	Regarding claim 8, Goyal teaches The system of claim 1, wherein the plurality of partitions for each layer is further determined based on spatial dimensions and feature dimensions of the input activations for that layer. ([¶0026] "The weight matrix W of N1×N2 is stored in column major form," FIG. 3  [¶0027] "For scalable matrix-matrix multiplication, the DLP 102 is configured to partition a large dense or sparse matrix into smaller portions and distribute the portions of the matrix across multiple tensor engines 104. In some embodiments, separate Compressed Sparse Row (CSR) or Compressed Sparse Column (CSC) Format can be adopted for the corresponding portion of the large matrix distributed to each of the tensor engines 104" the size of the layer is interpreted as being fully based on the spatial dimension and the feature dimensions.  Goyal FIG. 3 shows that each layer having three dimensions: height, width (the spatial dimensions from the input image) and a feature dimension.  Input activation is interpreted as N1 represents layer input. N2 represents layer output.  Goyal explicitly teaches that the distributed matrix depends on the spatial dimensions and feature dimensions.  N1 and N2 are taught as dense matrices that may be explicitly partitioned among tensor engines.). 

	Regarding claim 9, Goyal teaches The system of claim 1, wherein the plurality of partitions for each layer is further determined based on spatial dimensions of the output activations for that layer. ([¶0026] "The weight matrix W of N1×N2 is stored in column major form," FIG. 3  [¶0027] "For scalable matrix-matrix multiplication, the DLP 102 is configured to partition a large dense or sparse matrix into smaller portions and distribute the portions of the matrix across multiple tensor engines 104. In some embodiments, separate Compressed Sparse Row (CSR) or Compressed Sparse Column (CSC) Format can be adopted for the corresponding portion of the large matrix distributed to each of the tensor engines 104" the size of the layer is interpreted as being fully based on the spatial dimension and the feature dimensions.  Goyal FIG. 3 shows that each layer having three dimensions: height, width (the spatial dimensions from the input image) and a feature dimension.  Input activation is interpreted as N1 represents layer input. N2 represents layer output.  Goyal explicitly teaches that the distributed matrix depends on the spatial dimensions and feature dimensions.  N1 and N2 are taught as dense matrices that may be explicitly partitioned among tensor engines.). 

	Regarding claim 10, Goyal teaches The system of claim 1, wherein the plurality of partitions for each layer is further determined based on spatial dimensions and feature dimensions of the output activations for that layer. ([¶0026] "The weight matrix W of N1×N2 is stored in column major form," FIG. 3  [¶0027] "For scalable matrix-matrix multiplication, the DLP 102 is configured to partition a large dense or sparse matrix into smaller portions and distribute the portions of the matrix across multiple tensor engines 104. In some embodiments, separate Compressed Sparse Row (CSR) or Compressed Sparse Column (CSC) Format can be adopted for the corresponding portion of the large matrix distributed to each of the tensor engines 104" the size of the layer is interpreted as being fully based on the spatial dimension and the feature dimensions.  Goyal FIG. 3 shows that each layer having three dimensions: height, width (the spatial dimensions from the input image) and a feature dimension.  Input activation is interpreted as N1 represents layer input. N2 represents layer output.  Goyal explicitly teaches that the distributed matrix depends on the spatial dimensions and feature dimensions.  N1 and N2 are taught as dense matrices that may be explicitly partitioned among tensor engines.). 

	Regarding claim 11, Goyal teaches The system of claim 1, wherein the plurality of partitions for each layer is further determined based on one or more of spatial dimensions of the input activations, feature dimensions of the input activations, spatial dimensions of the output activations, or feature dimensions of the output activations for that layer. ([¶0026] "The weight matrix W of N1×N2 is stored in column major form," FIG. 3  [¶0027] "For scalable matrix-matrix multiplication, the DLP 102 is configured to partition a large dense or sparse matrix into smaller portions and distribute the portions of the matrix across multiple tensor engines 104. In some embodiments, separate Compressed Sparse Row (CSR) or Compressed Sparse Column (CSC) Format can be adopted for the corresponding portion of the large matrix distributed to each of the tensor engines 104" the size of the layer is interpreted as being fully based on the spatial dimension and the feature dimensions.  Goyal FIG. 3 shows that each layer having three dimensions: height, width (the spatial dimensions from the input image) and a feature dimension.  Input activation is interpreted as synonymous with N1 represents layer input. N2 represents layer output.  Goyal explicitly teaches that the distributed matrix depends on the spatial dimensions and feature dimensions.  N1 and N2 are taught as dense matrices that may be explicitly partitioned among tensor engines.). 

	Regarding claim 12, Goyal teaches The system of claim 11, wherein the plurality of partitions for each layer is further determined by a dimension of the plurality of cores. ([¶0024] "The DLP 102 is configured to distribute the sub-tasks among the tensor engines 104 under both scenarios where the number of sub-tasks is greater than the number of tensor engines 104 and where the number of sub-tasks is fewer than the number of tensor engines 104" The partitioning and distributing of the sub-tasks is explicitly taught as being with respect to the number of tensor engines.  Tensor engine is interpreted as being synonymous with neural core.). 

	Regarding claim 13, Goyal teaches The system of claim 1, wherein the cores within each of the plurality of partitions are configured to compute partial sums. ([¶0026] "Each time a block of weights is read from the weight matrix, they are multiplied element-wise with the block of the vector, summed, and added by the MatrixMul engine 408 as a partial sum to the corresponding output value" [¶0027] "MatrixMul engine 408 of each tensor engine 104 is then configured to perform a matrix-matrix multiplication on its corresponding portion of the partitioned matrix" MatrixMul engine 408 is part of tensor engine which is interpreted as synonymous with core.). 

	Regarding claim 14,  Goyal teaches The system of claim 13, wherein the partial sums are aggregated to compute a result for the associated layer. ([¶0026] "Each time a block of weights is read from the weight matrix, they are multiplied element-wise with the block of the vector, summed, and added by the MatrixMul engine 408 as a partial sum to the corresponding output value" corresponding output value is interpreted as synonymous with result for an associated layer). 

	Regarding claim 15, Goyal teaches The system of claim 14, wherein the partial sums are transmitted via a network on a chip (NoC) for aggregation. ([¶0027] "MatrixMul engine 408 of each tensor engine 104 is then configured to perform a matrix-matrix multiplication on its corresponding portion of the partitioned matrix" [¶0026] " During the entire process, the weight matrix is read N/T times and the input matrix is read K/T times while the output matrix is written/stored only once to the memory." [¶0027] " the MatrixMul engine 408 in each tensor engine 104 is configured to achieve efficient vector-matrix multiplication by minimizing or avoiding data movement for multiplication between a sparse vector and a dense or sparse matrix, wherein only data that corresponds to non-zero values in the sparse vector is loaded into memory 406" See FIG. 1 Data movement is interpreted as synonymous with transmitted via a network.  Iterative multiplication and addition to partial sum for the corresponding output is interpreted as synonymous with aggregation. DLP interpreted as network on a chip.). 

	Regarding claim 19, Goyal teaches A method comprising: reading a neural network model comprising a plurality of layers, each layer having at least one dimension and comprising a plurality of synaptic weights; (FIG. 2 [¶0026] "The weight matrix W of N1×N2 is stored in column major form, wherein corresponding weights for the vector are also read once in blocks of size B at a time first from the first column and then from the second column, etc." See FIG. 2 for layers, FIG. 3 for dimensions.)
	for each layer of the neural network model partitioning a plurality of cores into a plurality of partitions based on dimensions of the layer and vector units, ([¶0024] " In the example of FIG. 1, the DLP 102 adopts a multi-core structure and partitions each neural network processing task for pattern classification among a plurality of tensor engines (TEs) 104" [¶0027] "For scalable matrix-matrix multiplication, the DLP 102 is configured to partition a large dense or sparse matrix into smaller portions and distribute the portions of the matrix across multiple tensor engines 104...The MatrixMul engine 408 of each tensor engine 104 is then configured to perform a matrix-matrix multiplication on its corresponding portion of the partitioned matrix" neural network layer is interpreted as being represented as matrix corresponding to neural network processing task for pattern classification.  Each matrix operation is interpreted as synonymous with layer operation.  Matrix partitioning is explicitly taught as being based on size (or dimensions) of matrix across vector units in each tensor engine. Large is interpreted as referring to the dimensions of the matrix or layer.)
	configuring the plurality of cores to implement the layer, and ([¶0027] "For scalable matrix-matrix multiplication, the DLP 102 is configured to partition a large dense or sparse matrix into smaller portions and distribute the portions of the matrix across multiple tensor engines 104...The MatrixMul engine 408 of each tensor engine 104 is then configured to perform a matrix-matrix multiplication on its corresponding portion of the partitioned matrix" configuring the plurality of cores to implement the layer interpreted as synonymous with partitioning cores based on layer size.).
	providing to the plurality of cores input activations for the layer, ([¶0024] "Here, each of the plurality of tensor engines 104 is fully programmable and is configured to retrieve and process input data from the OSM 106 and/or the external memory resources via the DLCs 108," input data and input activations are interpreted as synonymous.  DLC and controller are interpreted as synonymous)
	apply the synaptic weights associated with the layer to the input activations to produce a plurality of output activations. ([¶0026] " The weight matrix W of N1×N2 is stored in column major form, wherein corresponding weights for the vector are also read once in blocks of size B at a time first from the first column and then from the second column, etc. Each time a block of weights is read from the weight matrix, they are multiplied element-wise with the block of the vector, summed, and added by the MatrixMul engine 408" weights are interpreted as synonymous with parameters.  Storing is interpreted as synonymous with distributing. MatrixMul engine 408 is an aspect of the plurality of cores, see FIG. 4). 

	Regarding claim 21, Goyal teaches The method of claim 19, wherein configuring the plurality of cores comprises distributing parameters to the plurality of cores via a network. ([¶0026] " The weight matrix W of N1×N2 is stored in column major form, wherein corresponding weights for the vector are also read once in blocks of size B at a time first from the first column and then from the second column, etc. Each time a block of weights is read from the weight matrix, they are multiplied element-wise with the block of the vector, summed, and added by the MatrixMul engine 408" weights are interpreted as synonymous with parameters.  Storing is interpreted as synonymous with distributing. MatrixMul engine 408 is an aspect of the plurality of cores, see FIG. 4). 

	Regarding claim 22, Goyal teaches The method of claim 19, wherein configuring the plurality of cores comprises distributing instructions to the plurality of cores via a network. ([¶0018] "In addition, the DLP runs a complete pipeline of deep learning processing/operations offloaded from a host/computing device" [¶0020] "DLP 102 further includes at least a plurality of tensor engines (TEs) 104, which are dedicated hardware blocks/components each including one or more microprocessors and on-chip memory units storing software instructions programmed by a user for various machine learning operations." offloading instructions from host/computing device is interpreted as synonymous with distributing instructions to the DLP where it can be further distributed to the user programmable cores.). 

	Regarding claim 23, Goyal teaches The method of claim 19, wherein the plurality of partitions for each layer is further determined based on one or more of spatial dimensions of the input activations, feature dimensions of the input activations, spatial dimensions of the output activations, or feature dimensions of the output activations for that layer. ([¶0026] "The weight matrix W of N1×N2 is stored in column major form," FIG. 3  [¶0027] "For scalable matrix-matrix multiplication, the DLP 102 is configured to partition a large dense or sparse matrix into smaller portions and distribute the portions of the matrix across multiple tensor engines 104. In some embodiments, separate Compressed Sparse Row (CSR) or Compressed Sparse Column (CSC) Format can be adopted for the corresponding portion of the large matrix distributed to each of the tensor engines 104" the size of the layer is interpreted as being fully based on the spatial dimension and the feature dimensions.  Goyal FIG. 3 shows that each layer having three dimensions: height, width (the spatial dimensions from the input image) and a feature dimension.  Input activation is interpreted as N1 represents layer input. N2 represents layer output.  Goyal explicitly teaches that the distributed matrix depends on the spatial dimensions and feature dimensions.  N1 and N2 are taught as dense matrices that may be explicitly partitioned among tensor engines.). 

	Regarding claim 24, Goyal teaches The system of claim 23, wherein the plurality of partitions for each layer is further determined by a dimension of the plurality of cores. ([¶0024] "The DLP 102 is configured to distribute the sub-tasks among the tensor engines 104 under both scenarios where the number of sub-tasks is greater than the number of tensor engines 104 and where the number of sub-tasks is fewer than the number of tensor engines 104"). 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: 
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 16-18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Goyal and in view of Huang (US 2019/0180170 A1). 

	Regarding claim 16, Goyal teaches The system of claim 2.
	However, Goyal does not explicitly teach the at least one controller is further adapted to, upon computation of output activations of a layer, redistribute the output activations among the plurality of cores.  

Huang, in the same field of endeavor, teaches the at least one controller is further adapted to, upon computation of output activations of a layer, redistribute the output activations among the plurality of cores. ([Abstract] "Performing the task can include computing an intermediate result using the first array of processing engines, copying the intermediate result to the second set of memory banks, and computing a final result using the second array of processing engines, where the final result corresponds to an outcome of performing the task." FIG. 5). 

	 Goyal and Huang are both directed towards distributed training of a neural network.  Therefore, Goyal and Huang are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the neural network system of Goyal with that of Huang by redistributing the output activations after computation. It would be obvious to one of ordinary skill in the art that in order to obtain the activation results in a distributed system said results should be distributed. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Huang ([¶0065] “FIG. 4 illustrates an example of the effect of storing the weight values for a neural network on-chip instead of in off-chip memory.”).

	Regarding claim 17, the combination of Goyal, and Huang teaches The system of claim 16, wherein the redistribution is via a network. (Goyal FIG. 1 [¶0019] "FIG. 1 depicts an example of a diagram of a system 100 configured to support hardware-based deep learning processing. Although the diagrams depict components as functionally separate, such depiction is merely for illustrative purposes. It will be apparent that the components portrayed in this figure can be arbitrarily combined or divided into separate software, firmware and/or hardware components. Furthermore, it will also be apparent that such components, regardless of how they are combined or divided, can execute on the same host or multiple hosts, and wherein the multiple hosts can be connected by one or more networks."). 

	Regarding claim 18, the combination of Goyal, and Huang teaches The system of claim 16, wherein the redistribution is determined based on one or more of spatial dimensions of the input activations, feature dimensions of the input activations, spatial dimensions of the output activations, or feature dimensions of the output activations for that layer. (Goyal [¶0026] "The weight matrix W of N1×N2 is stored in column major form," FIG. 3  [¶0027] "For scalable matrix-matrix multiplication, the DLP 102 is configured to partition a large dense or sparse matrix into smaller portions and distribute the portions of the matrix across multiple tensor engines 104. In some embodiments, separate Compressed Sparse Row (CSR) or Compressed Sparse Column (CSC) Format can be adopted for the corresponding portion of the large matrix distributed to each of the tensor engines 104" Huang teaches redistributing data through a DMA engine based on the output of a first calculation.  Goyal teaches distributing trough a DMA engine based on spatial dimensions of the layer.). 

	Regarding claim 20, Goyal teaches The method of claim 19, further comprising: computing partial sums within each partition; ([¶0026] "Each time a block of weights is read from the weight matrix, they are multiplied element-wise with the block of the vector, summed, and added by the MatrixMul engine 408 as a partial sum to the corresponding output value" [¶0027] "MatrixMul engine 408 of each tensor engine 104 is then configured to perform a matrix-matrix multiplication on its corresponding portion of the partitioned matrix" MatrixMul engine 408 is part of tensor engine which is interpreted as synonymous with core.)
	aggregating the partial sums to compute the output activations. ([¶0026] "Each time a block of weights is read from the weight matrix, they are multiplied element-wise with the block of the vector, summed, and added by the MatrixMul engine 408 as a partial sum to the corresponding output value" corresponding output value is interpreted as synonymous with output activation.).
	However, Goyal does not explicitly teach transmitting the partial sums among cores within each partition;  

Huang teaches transmitting the partial sums among cores within each partition; ([Abstract] "Performing the task can include computing an intermediate result using the first array of processing engines, copying the intermediate result to the second set of memory banks, and computing a final result using the second array of processing engines, where the final result corresponds to an outcome of performing the task." FIG. 5). 

	Goyal and Huang are both directed towards distributed training of a neural network.  Therefore, Goyal and Huang are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the neural network system of Goyal with that of Huang by redistributing the output activations after computation. It would be obvious to one of ordinary skill in the art that in order to obtain the activation results in a distributed system said results should be distributed. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Huang ([¶0065] “FIG. 4 illustrates an example of the effect of storing the weight values for a neural network on-chip instead of in off-chip memory.”).

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        
/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126