Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on February 25, 2022, in which claims 1, 3-8, 11-15, and 17-20 are amended. Claims 1-20 are currently pending.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on November 20, 2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Arguments
The rejections to claims 3,4,5,8, and 11-20 under 35 U.S.C. § 112(b) are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
The rejections to claims 1-20 under 35 U.S.C. § 101 are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
Applicant’s arguments with respect to rejection of claims 1-20 under 35 U.S.C. 103(a) based on amendment have been considered, however, have not been deemed persuasive. 

With respect to Applicant's arguments that the prior art references do not teach "training, by the quantized hardware accelerator, the quantized neural network model using the set of quantized-precision format numbers" Examiner respectfully disagrees.  The disclosure of Drumond is directed towards a quantized hardware accelerator, wherein Drumond teaches ([p. 5 §5.1] "We train DNNs with the hybrid approach, using BFP in the compute-intensive operations (matrix multiplications, convolutions) and FP32 in the other operations" [p. 8 §7] "We show that the hybrid approach leads to efficient hardware, with the bulk of the silicon real-estate spent on efficient fixed-point logic").  

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

	Claims 1-9 and 12-20 are rejected under 35 U.S.C. 103 as being unpatentable over Drumond (“End-to-End DNN Training with Block Floating Point Arithmetic”, 2018) and in view of Mellempudi (US 2018/0322607 A1).

	Regarding claim 1, Drumond teaches A method implemented at a computer system comprising: pretraining, by a general-purpose processor, the neural network model using normal-precision floating-point numbers ([p. 5 §5.1] "We modified TensorFlow’s (Abadi et al., 2016) matrix multiplications and convolution operations to reproduce the behaviour of BFP matrix multipliers in both the forward and backward passes." Tensorflow is software that is well-known in the art and implemented at a computer system [p. 6 §5.2] "Evaluation Metric. To evaluate the impact of BFP, we tune the models using only FP32, and then train the same models from scratch with the 
	converting an input tensor of the normal-precision floating-point numbers to a set of quantized-precision format numbers to generate a quantized neural network model ([Sec. 4.4] FIG. 5 "The FP-to-BFP units convert tensors by detecting the maximum exponent of the input FP tensors and normalizing the mantissas accordingly").
	at least one quantized-precision format number being selected to emulate a quantized hardware accelerator for processing a neural network comprising the input tensor; ([Sec. 5.1] "We modified
TensorFlow’s (Abadi et al., 2016) matrix multiplications and convolution operations to reproduce the behaviour of BFP matrix multipliers in both the forward and backward
passes...We used TensorFlow’s defun function to create a new op that processes the inputs and outputs of both the forward and backward passes of another tensorflow op, to simulate the usage of BFP" Simulating interpreted as synonymous with prototyping. See also FIG. 6).
	training, by the quantized hardware accelerator, the quantized neural network model using the set of quantized-precision format numbers ([p. 5 §5.1] "We train DNNs with the hybrid approach, using BFP in the compute-intensive operations (matrix multiplications, convolutions) and FP32 in the other operations" [p. 8 §7] "We show that the hybrid approach leads to efficient hardware, with the bulk of the silicon real-estate spent on efficient fixed-point logic").
comprising performing, by the quantized hardware accelerator at least one operation with the set of quantized-precision format number, producing a modified set of quantized-precision format numbers; ([Sec. 5.1] " We train DNNs with the hybrid approach, using BFP" See also FIG. 6. Quantized-precision format numbers are passed directly to the layer operation where a modified set of quantized-precision format numbers are produced.)
	converting the modified set of quantized-precision format numbers to a set of output tensors including a set of normal-precision floating-point format. (FIG. 5 "BFP to FP")
	 and transmitting the set of normal-precision floating-point format numbers back to the neural network model, such that latency and throughput of training of the neural network model are improved (FIG. 5 "BFP to FP" [p. 1 §1] "Signal processing platforms have historically resorted to block floating-point (BFP), whose representation is shown in Figure 1, as a way to optimize for both performance and density" Optimizing for performance and density interpreted as synonymous with improving latency and throughput.).
	While Drumond implicitly teaches that the neural network quantization is intended to improve latency and throughput, secondary reference Mellempudi is brought in to reinforce that the latency and throughput is a primary optimization design focus in a neural network accelerator.  
Mellempudi, who teaches a related art of a quantization enabled neural network accelerator, teaches A method implemented at a computer system for improving latency and throughput of training of a neural network model ([¶0103] "The 

	Drumond and Mellempudi are both directed towards a quantization enabled neural network accelerator.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the accelerators in Drumond and Mellempudi by focusing on throughput optimization. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Mellempudi ([¶0228] “additional operations for the tensor data can be performed in floating-point. Logic 1940 can be used to enable training for a dataset to be performed at least in part in a dynamic fixed-point precision, enabling a performance and efficiency gain during the earlier portion of training, reducing the overall training time for a neural network.”). 

	Regarding claim 2, the combination of Drumond, and Mellempudi teaches The method of claim 1, wherein: the quantized-precision format is a block floating-point format where at least two elements of the set of quantized-precision format numbers share a common exponent. (Drumond FIG. 1 "A n-element tensor in BFP and FP representations. BFP tensors save space and simplify computations by sharing 

	Regarding claim 3, the combination of Drumond, and Mellempudi teaches The method of claim 1, wherein: the quantized-precision format is a block floating-point format where at least two block floating- point format numbers in a tile, but not all of two columns, or two rows of the tile, or two tiles share a common exponent. (Mellempudi  FIG. 21A shows at least two rows but not all of the rows share a common exponent. [¶0188] “Each low-precision tensor may contain a data buffer and associated metadata represented as a data structure. The metadata may contain information pertaining to data type (integer, fixed-point, float or any other custom data type), precision and shared exponent(s)/scaling factor(s) necessary for performing data conversions and arithmetic operations. The data buffer may be stored as one contiguous block or many smaller blocks with as many exponents/scaling factors corresponding to each block.”). 
The motivation for combining Drumond and Mellempudi taught in claim 1 also applies to claim 3.

	Regarding claim 4, The method of claim 1, further comprising: generating the input tensor of normal-precision floating-point numbers by training a neural network, the set of normal-precision floating-point numbers representing at least one of edge weights or activation weights for the neural network, wherein: (Drumond [Introduction] "We propose a hybrid BFP-FP framework where values float freely between dot product computations in BFP, resulting in better choice of exponents, and perform the rest of the training in traditional floating point arithmetic. "  [Sec. 2] "these networks require hardware that is orders of magnitude simpler for inference, they are trained in a similar way to traditional neural networks, with both activations and parameters represented with floating-point." Parameter interpreted as synonymous with weight.  Activation interpreted as synonymous with activation weight.).
	the performing at least one operations comprises performing the at least one operations on the quantized-precision format numbers. (Drumond [Sec. 4.1] "BFP represents numbers with a mantissa and exponent, like floating-point, but exponents are shared across entire tensors, as shown in Figure 1, resulting in dot products that can be computed entirely in fixed-point logic." fixed point logic is interpreted as synonymous with normal-precision format operations.). 
The motivation for combining Drumond and Mellempudi taught in claim 1 also applies to claim 4.

	Regarding claim 5, the combination of Drumond, and Mellempudi teaches The method of claim 1, wherein the converting the input tensor comprises: identifying a shared exponent for a selected at least two elements of the input tensor; (Mellempudi [¶0191] "The dynamic fixed-point representation enables an 8×8 tensor 1415 of 32-bit floating-point values 1414 to be stored in an 8×8 tensor 1425 of 16-bit integer values, each associated with an 8-bit shared exponent.").
scaling values of the input tensor so that the integer portion of the scaled mantissas has a selected number of bits for the quantized precision format; removing fractional bits from the scaled integer portion of the mantissa; and (Mellempudi [¶0189] " To convert from floating-point to traditional fixed-point, one can multiply the floating-point value by 2fb, where fb is the number of fractional bits for the target fixed-point representation (e.g., 28, for 24.8 fixed-point) and round the result to the nearest integer." [¶0194] "To quantize an exemplary floating-point value 1512 (fx=3.4667968) having an exponent 1514A (Ex) and a mantissa 1514B (Mx), the mantissa 1514B is right shifted by the difference between the exponent 1514A and the absolute max value exponent to create a magnitude integer 1524 (Ix), with the implicit leading bit 1513 (LB) stored as an explicit bit 1523 within the magnitude integer 1524. The sign bit 1520 (Sx) is maintained for the quantized fixed-point value. The scaled exponent scale factor 1522 (SF) is computed as shown in equation (2) above." Right shifting interpreted as synonymous with removing bits.).
	rounding the mantissa to produce a quantized precision value. (Mellempudi [¶0217] "FIG. 17 illustrates floating-point to dynamic fixed-point biased rounding, according to an embodiment. Quantization with biased rounding as illustrated in FIG. 17 is similar to quantization as illustrated in FIG. 15A. Additionally, a round bit 1740 and a bias bit 1742 are used to capture bits that would otherwise be lost during the right shift to generate the integer magnitude value." FIG. 17 explicitly shows the rounding occurring in the mantissa.). 
The motivation for combining Drumond and Mellempudi taught in claim 1 also applies to claim 5.


	Regarding claim 6, the combination of Drumond, and Mellempudi teaches 
	The method of claim 1, further comprising: reshaping the input tensor to allow the converting the input tensor to include independent operations on portions of the input tenor. (Mellempudi [¶0217] "FIG. 17 illustrates floating-point to dynamic fixed-point biased rounding, according to an embodiment. Quantization with biased rounding as illustrated in FIG. 17 is similar to quantization as illustrated in FIG. 15A. Additionally, a round bit 1740 and a bias bit 1742 are used to capture bits that would otherwise be lost during the right shift to generate the integer magnitude value." reshaping the input tensor interpreted as quantizing the tensor values.  Converting interpreted as synonymous with casting, independent operations on portions of the input tensor interpreted as synonymous with integer based math or other bit operations such as shifting the fractional bit.). 
The motivation for combining Drumond and Mellempudi taught in claim 1 also applies to claim 6.

	Regarding claim 7,  The method of claim 1, wherein: the input tensor represents a portion of a previously-trained neural network, (Drumond [Sec. 5.1] "In the backward pass, we perform the same pre-/post-processing of the inputs/outputs of the x derivative (Figure 6b), but handle the w derivative differently (Figure 6c) since it performs a reduction across entire batches. Thus, to emulate the behavior of an accelerator with native BFP, we convert inputs to BFP tensors that share exponents 
	the performing at least one operation comprises performing inference operations with the quantized neural network; and (Drumond [Sec. 4.4] "The activation/loss and the conversion units are capable of processing a single 75-wide tensor per cycle. Weights are kept in BFP throughout the entire training process and during inference.").
	the method further comprises: comparing output of the neural network based on the inference operations to output of the previously-trained neural network in the floating point format. (Drumond [Sec. 6] "We now evaluate DNN training with the hybrid approach, that is referred to as BFP for simplicity, comparing it to FP32-based training." [Sec. 6.1] "Although 4-bit-mantissa BFP is outperformed by FP32, it still converges, uncovering a quality-performance trade-off: users that can tolerate models with lower quality can achieve better energy-efficiency during training and inference."). 

	Regarding claim 8, the combination of Drumond, and Mellempudi teaches The method of claim 1, wherein the input tensor represents a neural network, and wherein the method further comprises: (Drumond [Sec. 5.1] "In the backward pass, we perform the same pre-/post-processing of the inputs/outputs of the x derivative (Figure 6b), but handle the w derivative differently (Figure 6c) since it performs a 
	calculating loss of a neural network using the set of quantized-precision format numbers; and (Sec 5.1: BFP in Drumond is interpreted as synonymous with quantized-precision format numbers.).
	updating the modified set of quantized-precision format numbers based on a gradient calculated based on the calculated loss of the neural network. (Mellempudi [¶0158] "The error values are then propagated backwards until each neuron has an associated error value which roughly represents its contribution to the original output. The network can then learn from those errors using an algorithm, such as the stochastic gradient descent algorithm, to update the weights of the of the neural network." error and loss are interpreted as synonymous.). 
The motivation for combining Drumond and Mellempudi taught in claim 1 also applies to claim 7.

	Regarding claim 9, the combination of Drumond, and Mellempudi teaches The method of claim 1, wherein: the normal-precision floating-point format is one of the following: a 16-bit floating-point format, a 32-bit floating-point format, a 64-bit floating-point format, or an 80-bit floating-point format. (Drumond [Sec. 5.2] "Evaluation Metric. To evaluate the impact of BFP, we tune the models using only FP32, 

	Regarding claim 12, Drumond teaches A quantization-enabled system for modeling a neural network comprising tensors representing node weights and edges, the system comprising: ([Introduction] "In this paper, we make the observation that in DNNs, the majority of the arithmetic operations executed are performed as part of dot product calculations, and therefore, limiting dense fixed-point-like arithmetic to only replacing the dot products still allows us to accelerate the majority of the network. As such, the rest of the operations can be implemented in traditional floating-point logic with little performance degradation. We propose a hybrid BFP-FP framework where values float freely between dot product computations in BFP, resulting in better choice of exponents, and perform the rest of the training in traditional floatingpoint arithmetic.").
	pretrain, by the one or more general-purpose processors, the neural network using normal-precision floating-point numbers ([p. 6 §5.2] "Evaluation Metric. To evaluate the impact of BFP, we tune the models using only FP32, and then train the same models from scratch with the same hyper-parameters in BFP. We report training loss and best top-1 error" Drumond explicitly teaches that the model is pretrained using normal-precision floating-point (FP32) and then tested against the dynamic fixed-point format.).
	evaluate the neural network having its node weights and edges stored in the memory as a normal-precision floating-point format; ([Sec. 2] "these networks require hardware that is orders of magnitude simpler for inference, they are trained in a 
with floating-point." node weights and edges interpreted as synonymous with parameters).
	convert at least one of the tensors to a set of quantized-precision format numbers to generate a quantized neural network model ([Sec. 4.4] FIG. 5 "The FP-to-BFP units convert tensors by detecting the maximum exponent of the input FP tensors and normalizing the mantissas accordingly").
	Train, by the quantized hardware accelerator, the quantized neural network model using the set of quantized-precision format numbers ([p. 6 §5.2] "Evaluation Metric. To evaluate the impact of BFP, we tune the models using only FP32, and then train the same models from scratch with the same hyper-parameters in BFP. We report training loss and best top-1 error" Drumond explicitly teaches that the model is pretrained using normal-precision floating-point (FP32) and then tested against the model trained with the dynamic fixed-point format.).
	comprising performing at least one mathematical operation with the at least one of the set of quantized-precision format numbers to produce, modified tensors; and ([Sec. 5.1] " We train DNNs with the hybrid approach, using BFP" See also FIG. 6. [p.1 §1] "In this paper, we make the observation that in DNNs, the majority of the arithmetic operations executed are performed as part of dot product calculations, and therefore, limiting dense fixed-point-like arithmetic to only replacing the dot products still allows us to accelerate the majority of the network." Quantized-precision format 
	convert the modified tensors to an output tensor of numbers in the normal-precision floating-point format. (FIG. 6 "BFP to FP").
	Transmit the output tensor of numbers in the normal-precision floating-point format back to the neural network, such that latency and throughput of modeling the neural network are improved. (FIG. 5 "BFP to FP" [p. 1 §1] "Signal processing platforms have historically resorted to block floating-point (BFP), whose representation is shown in Figure 1, as a way to optimize for both performance and density" Optimizing for performance and density interpreted as synonymous with improving latency and throughput.).
	While Drumond implicitly teaches improving throughput and latency, Drumond does not explicitly teach that the block floating point format improves throughput and latency. Furthermore, Drumond does not explicitly teach memory; 
	one or more general-purpose processors coupled to the memory; 
	one or more computer readable storage media storing computer-readable instructions that when executed by the one or more general-purpose processors or the quantized hardware accelerator, configure the system to perform at least:  

Mellempudi, in the same field of endeavor, teaches memory; (FIG. 1 [¶0045] "The computing system 100 includes a processing subsystem 101 having one or more processor(s) 102 and a system memory 104").
one or more general-purpose processors coupled to the memory; (FIG. 1 [¶0043] "In some embodiments, a graphics processing unit (GPU) is communicatively coupled to host/processor cores to accelerate graphics operations, machine-learning operations, pattern analysis operations, and various general-purpose GPU (GPGPU) functions.").
	one or more computer readable storage media storing computer-readable instructions that when executed by the one or more general-purpose processors or the quantized hardware accelerator, configure the system to perform at least: ([¶0045] “FIG. 1 is a block diagram illustrating a computing system 100 configured to implement one or more aspects of the embodiments described herein. The computing system 100 includes a processing subsystem 101 having one or more processor(s) 102 and a system memory 104 communicating via an interconnection path that may include a memory hub 105”). 
	Mellempudi also reinforces that latency and throughput are improved by using a dynamic precision quantization format ([¶0103] "The accelerator integration circuit 436 may perform the same operations as those described with respect to FIG. 4B, but potentially at a higher throughput given its close proximity to the coherency bus 462 and caches 462A-462D, 426" [¶0213] “The biased rounding technique avoid the requirement of a random number generator for computing the random numbers used in stochastic rounding, enabling reduced latency and power consumption relative to stochastic rounding.”).



	Regarding claim 13, the combination of Drumond, and Mellempudi teaches The system of claim 12, wherein: the mathematical operation is performed with the quantized-precision format numbers (Drumond [Sec. 4.1] "BFP represents numbers with a mantissa and exponent, like floating-point, but exponents are shared across entire tensors, as shown in Figure 1, resulting in dot products that can be computed entirely in fixed-point logic." The quantized values in Drumond are taught as being stored as standard floating point format with the exponent shared across a tensor.). 

	Regarding claim 14, the combination of Drumond, and Mellempudi teaches The system of claim 12, wherein:the mathematical operation is performed by emulating quantized operations with the quantized-precision format numbers. (Drumond [Sec. 5.1] "In the backward pass, we perform the same pre-/post-processing 

	Regarding claim 15, the combination of Drumond, and Mellempudi teaches The system of claim 12, wherein the modified tensors represent a quantized neural network, and performing the at least one mathematical operation further comprise: performing quantized training of the quantized neural network to produce the modified tensors. (Drumond [Sec. 5.1] " We train DNNs with the hybrid approach, using BFP" See also FIG. 6. Quantized-precision format numbers are passed directly to the layer operation where a modified set of quantized-precision format numbers are produced.). 

	Regarding claim 16, the combination of Drumond, and Mellempudi teaches The system of claim 12, wherein the instructions further comprise: instructions to program a neural network accelerator with quantized values determined based on executing the instructions to convert the tensors, to perform the at least one mathematical operation, and/or to convert the modified tensors to the normal-precision floating-point format (Drumond FIG. 6 "BFP to FP"). 

	Regarding claim 17, Drumond teaches One or more computer-readable hardware storage devices storing computer-readable instructions that when executed by a general-purpose processor or a quantized hardware accelerator, cause the processor to perform at least: ([Introduction] "In this paper, we make the observation that in DNNs, the majority of the arithmetic operations executed are performed as part of dot product calculations, and therefore, limiting dense fixed-point-like arithmetic to only replacing the dot products still allows us to accelerate the majority of the network. As such, the rest of the operations can be implemented in traditional floating-point logic with little performance degradation. We propose a hybrid BFP-FP framework where values float freely between dot product computations in BFP, resulting in better choice of exponents, and perform the rest of the training in traditional floatingpoint arithmetic.").
	pretrain, by a general-purpose processor, a neural network model using normal-precision floating-point numbers ([p. 6 §5.2] "Evaluation Metric. To evaluate the impact of BFP, we tune the models using only FP32, and then train the same models from scratch with the same hyper-parameters in BFP. We report training loss and best top-1 error" Drumond explicitly teaches that the model is pretrained using normal-precision floating-point (FP32) and then tested against the dynamic fixed-point format.).
	specify, by the general-purpose processor, at least one normal-precision floating-point format tensor (FIG. 6 "BFP to FP").
convert, by the general-purpose processor the neural network to a quantized neural network model by converting the at least normal precision floating point format tensor to at least one quantized precision format tensotr ([Sec. 4.4] FIG. 5 "The FP-to-BFP units convert tensors by detecting the maximum exponent of the input FP tensors and normalizing the mantissas accordingly").
	training, by the quantized hardware accelerator, the quantized neural network model using the at least one quantized precision format tensor, comprising performing, by the quantized hardware accelerator, at least one tensor operation using the quantized precision format ([Sec. 4.1] "BFP represents numbers with a mantissa and exponent, like floating-point, but exponents are shared across entire tensors, as shown in Figure 1, resulting in dot products that can be computed entirely in fixed-point logic." dot product is interpreted as a tensor operation).
	convert, by the general purpose processor, an output of the at least one tensor operation to at least one normal precision floating-point format output tensor (FIG. 6 "BFP to FP").

Regarding claim 18, the combination of Drumond and Mellempudi teaches The one or more computer-readable hardware storage devices of claim 17, wherein the at least one normal-precision floating point format tensor includes at least one of the following: a bit width of node weights, a bit width of activation values, a floating-point format for performing non-quantized operations, a tile size for a shared exponent, a parameter to share an exponent on a per-row basis, a parameter to share an exponent on a per-column basis, and/or a parameter specifying a method of common exponent selection. (Mellempudi FIG. 21A [¶0235] "In addition to shared exponent or scaling factor data, the metadata can also contain other terms used for data conversions, such as floating-point to fixed point or fixed-point to floating-point conversions" floating-point to fixed point conversions interpreted as synonymous with floating-point format for performing non-quantized operations.). 

Regarding claim 19, the combination of Drumon and Mellempudi teaches The one or more computer-readable hardware storage devices of claim 17, wherein the computer-readable instructions further comprise a parameter to specify flattening a tensor prior to quantization. (Mellempudi [¶0171] "The untrained neural network 1106 can learn groupings within the unlabeled input and can determine how individual inputs are related to the overall dataset. Unsupervised training can be used to generate a self-organizing map, which is a type of trained neural network 1107 capable of performing operations useful in reducing the dimensionality of data." [¶0231] " In a training scenario, some tensors can be blocked" Mellempudi explicitly teaches reducing the dimensionality of the input data prior to training and furthermore teaches quantizing the input tensor for training.  While Mellempudi does not explicitly teach flattening the tensor prior to quantization this simply amounts to a change in sequence and according to In reBurhans, 154 F.2d 690, 69 USPQ 330 (CCPA 1946) (selection of any order of performing process steps is prima facie obvious in the absence of new or unexpected results).  Quantizing the tensor prior to flattening would be expected to yield similar or even more performant results, therefore this limitation is considered obvious and yielding of expected results.). 

Regarding claim 20, the combination of Drumond and Mellempudi teaches The one or more computer-readable hardware storage devices of claim 17, wherein the computer-readable instructions further comprise: instructions to provide a class method defining a quantized matrix multiplication operation. ([¶0234] " One solution is to split the tensor into smaller blocks with independent shared exponent while maintaining the integer data at lower precision. This technique is useful for expressing large matrix multiplication and convolution operations.") 

	Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Drumond, and Mellempudi and in further view of Yang (US20180157940A1).

	Regarding claim 10, the combination of Drumond and Mellempudi teaches The method of claim 1.  
However, the combination of Drumond and Mellempudi does not explicitly teach wherein: the input tensor has two dimensions X and N, 
	the performing the at least one operation comprises applying a convolution kernel having three dimensions K, N, and P to the input tensor, the method further comprising: flattening the convolution kernel into a two-dimensional matrix having two dimensions K×N and P; 
	and converting the input tensor into a matrix having two dimensions K×N and X.  

Yang, who teaches a related art of a neural network accelerator, teaches The method of claim 1, wherein: the input tensor has two dimensions X and N, ( 2I*M is interpreted as synonymous with X, and 2J*M is interpreted as synonymous with N.).
	the performing the at least one operation comprises applying a convolution kernel having three dimensions K, N, and P to the input tensor, the method further comprising: flattening the convolution kernel into a two-dimensional matrix having two dimensions K×N and P; and ([¶0060] "After 3×3 convolutions for each group of imagery data are performed for predefined number of filter coefficients, convolution operations results Out(m, n) are sent to the first set of memory buffers via another multiplex" [¶0047] "m, n are corresponding row and column numbers for identifying which imagery data (pixel) within the (M+2)-pixel by (M+2)-pixel region the convolution is performed;" 3x3 convolution interpreted as refering to a three dimensional convolution kernel.  Outputting as two parameters interpreted as flattening the results. See also formula 1 ¶0046).
	converting the input tensor into a matrix having two dimensions K×N and X. ([¶0062] "If a 2×2 pooling operation is required, the M×M output results are reduced to (M/2)×(M/2)" See also FIG. 10). 

	Drumond, Mellempudi, and Yang are all directed towards neural network accelerators.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Drumond and Mellempudi with the teachings of Yang by implementing convolution specific matrix .

	Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Drumond, and Mellempudi and in further view of Jin (US 2018/0089562 A1).  

	Regarding claim 11, Drumond teaches The method of claim 1.  
However, Drumond does not explicitly teach, wherein: the input tensor has three dimensions X, Y, and N, the performing the at least one operation comprises 
	applying a convolution kernel having four dimensions K, L, N, and P to the input tensor, the method further comprising: 
	converting the input tensor into a matrix having two dimensions K×L×N and M.  

Jin, who teaches a related art of a neural network accelerator, teaches wherein: the input tensor has three dimensions X, Y, and N, the performing the at least one operation comprises ([¶0046] "Referring to FIG. 2A, input data 200 on the leftmost 
	applying a convolution kernel having four dimensions K, L, N, and P to the input tensor, the method further comprising: (FIG. 2B shows that convolution operation is dependent on four dimensional variables a,b,c, and d. [¶0047] "A convolution layer may perform a convolution operation on two weight sets 220 having a size of C×C×D and each of the input data...A convolution operation may be identically performed on all of depths Di (d=0,1,2)").
	converting the input tensor into a matrix having two dimensions K×L×N and M. (Figure 2B explicitly shows that output of one pooling layer can be used for a subsequent pooling layer, see also FIG 1).  

Drumond, Mellempudi, and Jin are all directed towards neural network accelerators.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the CNN from Jin with the accelerator in Drumond and Mellempudi by implementing convolution specific optimizations.  Jin teaches as a motivation for combination ([¶0040] “In accordance with an embodiment, the CNN architecture can reduce an operation latency using a drop-out method, e.g., a drop-out method, that is, a regularization method, for improving performance of an algorithm in a fully connected layer.”).
 
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.






/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        
/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126