Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on October 25, 2022, in which claims 1-5, 12, 17, 21-24, and 29 are currently amended. Claims 1-6, 9-12, 17, and 21-29 are currently pending. 

Information Disclosure Statement
The information disclosure statement (IDS) submitted on October 21, 2022 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Arguments
Applicant’s arguments with respect to rejection of claims 1-6, 9-12, 17, and 21-29 under 35 U.S.C. 102/103 based on amendment have been considered and are persuasive. The argument is moot in view of a new ground of rejection set forth below.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: 
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

	Claims 1, 4, 9, 12, 17, 23, and 26 are rejected under U.S.C. §103 as being unpatentable over the combination of Yao (US20200380357A1) and Shin (“FIXED-POINT OPTIMIZATION OF DEEP NEURAL NETWORKS WITH ADAPTIVE STEP SIZE RETRAINING”, 2017). 

	 Regarding claim 1, Yao teaches A method, implemented at a computer system, for training a neural network using a quantized precision floating-point format, comprising:([¶0180] "To address these and other issues, subject matter here describes Incremental Network Quantization (INQ) targeted to convert any pre-trained full-precision (i.e., 32-bit floating-point) DNN model into a lossless low-precision version")
	training, by a general-purpose processor, a first neural network model that is expressed using normal-precision floating-point values to create a first trained neural network model;([¶0162] " A deep belief network (DBN) is a generative neural network that is composed of multiple layers of stochastic (random) variables. DBNs can be trained layer-by-layer using greedy unsupervised learning. The learned weights of the DBN can then be used to provide pre-train neural networks by determining an optimal initial  weights for the neural network." [¶0180] "To address these and other issues, subject matter here describes Incremental Network Quantization (INQ) targeted to convert any pre-trained full-precision (i.e., 32-bit floating-point) DNN model into a lossless low-precision version" Yao explicitly teaches methods of pre-training a neural network.  Pre-trained full-precision model interpreted as synonymous with a first trained neural network model.)
	quantizing the first trained neural network model to create a first quantized neural network model, including converting a first normal-precision floating-point value within the first trained neural network model to a corresponding first quantized-precision format value;([¶0200] " Weight partition is to divide the weights in each layer of a pre-trained full-precision DNN model into two disjoint groups which play complementary roles in INQ. The weights in the first group are responsible for forming a low-precision base for the original model, thus they are quantized by using Equation (4), above. The weights in the second group adapt to compensate for the loss in model accuracy, thus they are the ones to be re-trained. Once the first run of the quantization and re-training operations is finished, all the three operations are further conducted on the second weight group (i.e., latest re-trained weight group) in an iterative manner, until all the weights are converted to be either powers of two or zero, thereby acting as an incremental network quantization and accuracy enhancement procedure." Weights in first group interpreted as first quantized-precision format values of the first quantized neural network model.)
	determining a loss of the first quantized neural network model, including comparing a first output of the first trained neural network model with a second output of the first quantized neural network model;([¶0199] "the INQ techniques described herein implement a strategy to suppress resulting quantization loss in model accuracy." [¶0200] "the INQ techniques described herein implement a strategy to suppress resulting quantization loss in model accuracy." See also Equation on ¶0204 for minimization which relies upon determined network loss.  Network loss and quantization loss are both  losses of the first quantized neural network model.  While it would be obvious to one of ordinary skill in the art that the loss is determined by comparing a first output with a second output, Yao describes this explicitly at ¶0164.  Quantization loss interpreted as synonymous with comparing the output of the first trained neural network model with the output of the resulting quantized model.)
	and based on the loss of the first quantized neural network model: adjusting the first neural network model to create a second neural network model that is expressed using normal-precision floating-point values, including calculating a gradient, and adjusting at least one of a node weight, an input weight, or an activation value based on the gradient;([¶0206] "Pl is determined at group-wise quantization operation, and the binary matrix Tl acts as a mask which is determined by weight partition operation. Since Pl and are known, the optimization problem of Equation (7) can be solved using a Stochastic Gradient Decent (SGD) method. In one example the update scheme for the re-training may be derived as...where γ is a positive learning rate. Note that the binary matrix Tl forces zero update to the weights that have been quantized. That is, only the weights still keep with floating-point values are updated." Only updating weights that are floating-point values is interpreted as synonymous with adjusting a node weight of the first neural network model to create a second neural network model based on the loss of the first quantized neural network model.)
	training the second neural network model create a second trained neural network model;([¶0200] " Weight partition is to divide the weights in each layer of a pre-trained full-precision DNN model into two disjoint groups which play complementary roles in INQ. The weights in the first group are responsible for forming a low-precision base for the original model, thus they are quantized by using Equation (4), above. The weights in the second group adapt to compensate for the loss in model accuracy, thus they are the ones to be re-trained. Once the first run of the quantization and re-training operations is finished, all the three operations are further conducted on the second weight group (i.e., latest re-trained weight group) in an iterative manner, until all the weights are converted to be either powers of two or zero, thereby acting as an incremental network quantization and accuracy enhancement procedure.")
	and quantizing the second trained neural network model to create a second quantized neural network model, including converting the second normal-precision floating-point value within the second trained neural network model to a corresponding second quantized-precision format value([¶0203] "At operation 2035 the quantization and retraining operations are repeated until the model weights are fully quantized as powers of two or zero. This is illustrated in the transition between FIG. 21B and FIG. 21C. In FIG. 22, the lower row depicts results from the second, third, and fourth iterations of the INQ." Second iteration of INQ interpreted as synonymous with quantizing the second trained neural network model to create a second quantized neural network model.)
	the second quantized-precision format value being different than the first quantized-precision format value.([¶0203] "At operation 2035 the quantization and retraining operations are repeated until the model weights are fully quantized as powers of two or zero. This is illustrated in the transition between FIG. 21B and FIG. 21C. In FIG. 22, the lower row depicts results from the second, third, and fourth iterations of the INQ." Second iteration of INQ interpreted as synonymous with quantizing the second trained neural network model to create a second quantized neural network model.  Yao explicitly teaches that only the weights which have not been quantized are updated, such that the second quantized precision format value is from a different sub the model and necessarily different than the first quantized-precision format value.).
	However, Yao does not explicitly teach the adjusting being based at least on changing the first normal-precision floating-point value to a second normal-precision floating-point value within the second neural network model, the second normal-precision floating-point value being different than the first normal-precision floating-point value;.

	Shin, in the same field of endeavor, teaches the adjusting being based at least on changing the first normal-precision floating-point value to a second normal-precision floating-point value within the second neural network model, the second normal-precision floating-point value being different than the first normal-precision floating-point value;([p. 1204 §2.1] "Then, in the second stage of Figure 1, floating-point weights are rounded to fixed-point values by using the determined quantization step size. The third stage is the inferencing or the forward stage with the quantized network, w(q). The error signal is calculated and used for backward propagation. The gradient is calculated and weight update is conducted. Note that the floating-point weights, instead of the fixed-point values, are updated because the amount of weight update is usually much smaller than the quantization step size. Then, the fixed-point weight update, yielding w(q)ij,new, is accomplished by quantizing the updated floating-point weights." Fixed-point interpreted as synonymous with quantized-precision format.  Floating point interpreted as synonymous with normal-precision floating-point. Shin teaches updating the full-precision values in retraining based on the loss before re-quantization.). 

	Yao as well as Shin are directed towards quantization enabled neural network accelerators.  Therefore, Yao as well as Shin are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Yao with the teachings of Shin by re-quantizing updated full-precision weights.  While it would have been obvious to one of ordinary skill in the art that the method in Yao could be performed a second time to further refine the quantized model, Shin explicitly teaches re-quantizing the model to maximize accuracy.  Shin provides as additional motivation for combination ([p. 1206 §4] "The proposed work yields better quantization results in FFDNN, CNN, and RNN experiments. Especially the effectiveness of the proposed techniques increases when the number of quantization levels is small and the network size is not large enough").  This motivation for combination also applies to the remaining claims which depend on this combination.

	 Regarding claim 4, the combination of Yao and Shin teaches The method of claim 1, further comprising: generating an input tensor of normal-precision floating-point values based on training the first neural network model, the first normal-precision floating-point value representing at least one of an edge weight or an activation weight for the first neural network model(Yao [¶0190] "all model parameters are characterized as weights. A pre-trained full-precision (i.e., 32-bit floating-point) DNN model can be represented by (Wl:1≤l≤L), where Wl denotes the weight  the lth layer, and L denotes the number of learnable layers in the model.").
	
	 Regarding claim 9, the combination of Yao and Shin teaches The method of claim 1, wherein: the normal-precision floating-point format is one of the following: a 16-bit floating-point format, a 32-bit floating-point format, a 64-bit floating-point format, or an 80-bit floating-point format.(Yao [¶0190] "all model parameters are characterized as weights. A pre-trained full-precision (i.e., 32-bit floating-point) DNN model can be represented by (Wl:1≤l≤L), where Wl denotes the weight  the lth layer, and L denotes the number of learnable layers in the model.").
	
	Regarding claim 12, claim 12 is directed towards a system for implementing the method of claim 1.  Therefore, the rejection applied to claim 1 also applies to claim 12.  Claim 12 also recites additional elements which are addressed by Yao including memory; ([¶0030] " the processor 102 includes cache memory 104. Depending on the architecture, the processor 102 can have a single internal cache or multiple levels of internal cache.")
	one or more processors coupled to the memory; ([¶0029] "the one or more processors 102 each include one or more processor cores 107 to process instructions which, when executed, perform operations for system and user software.")
	one or more computer readable storage media storing computer-readable instructions that when executed by the general-purpose processors or the quantized hardware accelerator, configure the system to perform at least: ([¶0029] "the one or more processors 102 each include one or more processor cores 107 to process instructions which, when executed, perform operations for system and user software."). 
Similarly, regarding claims 23 and 26 which are dependent on claim 12.  Claims 23 and 26 effectively mirror claims 4 and 9, respectively, which depend on claim 1.  Therefore, the rejections applied to claims 4 and 9 also apply to claims 23 and 26.  
	
Regarding claim 17, claim 17 is substantially similar to claim 12.  Therefore, the rejection applied to claim 12 also applies to claim 17.

	Claims 2, 21, and 29 are rejected under U.S.C. §103 as being unpatentable over the combination of Yao and Shin and Drumond (“End-to-End DNN Training with Block Floating Point Arithmetic”, 2018).

	 Regarding claim 2, the combination of Yao and Shin teaches The method of claim 1.
	However, the combination of Yao and Shin doesn't explicitly teach the quantized-precision format is a block floating-point format where at least two elements of the  first  quantized-precision format values share a common exponent..

	Drumond, in the same field of endeavor, teaches the quantized-precision format is a block floating-point format where at least two elements of the  the first  quantized-precision format values share a common exponent.(FIG. 1 "A n-element tensor in BFP and FP representations. BFP tensors save space and simplify computations by sharing exponents across tensors." The 10-bit exponent is shared for all n of the quantized precision elements in the tensor.).

	The combination of Yao and Shin as well as Drumond are directed towards accelerating neural networks by quantization.  Therefore, the combination of Yao and Shin as well as Drumond are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Yao and Shin with the teachings of Drumond by using a block-floating point format.  It would be obvious to one of ordinary skill in the art that a major advantage to block floating point is the ability to simultaneously reduce memory usage and maintain floating point accuracy in quantization.  This is further reinforced by Drumond ([p. 1 §1] "Signal processing platforms have historically resorted to block floating-point (BFP), whose representation is shown in Figure 1, as a way to optimize for both performance and density. The use of BFP has allowed signal processors to convert common algorithms (e.g., FFT) to dense and parallel integer arithmetic hardware. We observe that BFPs are also likely to be effective in neural networks, increasing the arithmetic density of accelerators and improving the dynamic range of fixed-point-like arithmetic taking the first step towards effective training in dense arithmetic").  This motivation for combination also applies to the remaining claims which depend on this combination.

Regarding claim 21, claim 21 is directed towards a system for implementing the method of claim 2.  Therefore, the rejection applied to claim 2 also applies to claim 21.  Claim 12 also recites additional elements which are addressed by Yao including memory; ([¶0030] " the processor 102 includes cache memory 104. Depending on the architecture, the processor 102 can have a single internal cache or multiple levels of internal cache.")
	one or more processors coupled to the memory; ([¶0029] "the one or more processors 102 each include one or more processor cores 107 to process instructions which, when executed, perform operations for system and user software.")
	one or more computer readable storage media storing computer-readable instructions that when executed by the general-purpose processors or the quantized hardware accelerator, configure the system to perform at least: ([¶0029] "the one or more processors 102 each include one or more processor cores 107 to process instructions which, when executed, perform operations for system and user software.").

	Regarding claim 29, claim 29 is substantially similar to claim 21.  Therefore, the rejection applied to claim 21 also applies to claim 29.

	Claims 3, 5, 6, 22, 24, and 25 are rejected under U.S.C. §103 as being unpatentable over the combination of Yao and Shin and Mellempudi (US 2018/0322607 A1).

	 Regarding claim 3, the combination of Yao and Shin teaches The method of claim 1.
	However, the combination of Yao and Shin doesn't explicitly teach the quantized-precision format is a block floating-point format where at least two block floating- point format values in a tile, but not all of two columns, or two rows, of the tile, or two tiles share a common exponent..

	Mellempudi, in the same field of endeavor, teaches the quantized-precision format is a block floating-point format where at least two block floating- point format values in a tile, but not all of two columns, or two rows, of the tile, or two tiles share a common exponent.([¶0188] “Each low-precision tensor may contain a data buffer and associated metadata represented as a data structure. The metadata may contain information pertaining to data type (integer, fixed-point, float or any other custom data type), precision and shared exponent(s)/scaling factor(s) necessary for performing data conversions and arithmetic operations. The data buffer may be stored as one contiguous block or many smaller blocks with as many exponents/scaling factors corresponding to each block.” [¶0237] "The metadata can maintain the exponent scaling factor for each block, as well as the block size for each block. In one embodiment, partitioning with variable block sizes can be performed along all dimensions of the tensor. A data representation can be generated that has variable size blocks in all dimensions of the tensor" FIG. 21A shows at least two rows but not all of the rows share a common exponent.  Block interpreted as synonymous with tile.).

	The combination of Yao and Shin as well as Mellempudi are directed towards accelerating neural networks through quantization.  Therefore, the combination of Yao and Shin as well as Mellempudi are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Yao and Shin with the teachings of Mellempudi by focusing on throughput optimization.  Mellempudi teaches as an additional motivation for combination ([¶0228] “additional operations for the tensor data can be performed in floating-point. Logic 1940 can be used to enable training for a dataset to be performed at least in part in a dynamic fixed-point precision, enabling a performance and efficiency gain during the earlier portion of training, reducing the overall training time for a neural network.”).  This motivation for combination also applies to the remaining claims which depend on this combination.

	 Regarding claim 5, the combination of Yao and Shin teaches The method of claim 4.
	However, the combination of Yao and Shin doesn't explicitly teach converting the input tensor to a  values represented in a quantized precision format, including: identifying a shared exponent for a selected at least two elements of the input tensor;
	scaling values of the input tensor so that the integer portion of the scaled mantissas has a selected number of bits for the quantized precision format; removing fractional bits from the scaled integer portion of the mantissa; and
	rounding the mantissa to produce a quantized precision value..

	Mellempudi, in the same field of endeavor, teaches converting the input tensor to a  values represented in a quantized precision format, including: identifying a shared exponent for a selected at least two elements of the input tensor;([¶0191] "The dynamic fixed-point representation enables an 8×8 tensor 1415 of 32-bit floating-point values 1414 to be stored in an 8×8 tensor 1425 of 16-bit integer values, each associated with an 8-bit shared exponent.")
	scaling values of the input tensor so that the integer portion of the scaled mantissas has a selected number of bits for the quantized precision format; removing fractional bits from the scaled integer portion of the mantissa; and([¶0189] " To convert from floating-point to traditional fixed-point, one can multiply the floating-point value by 2fb, where fb is the number of fractional bits for the target fixed-point representation (e.g., 28, for 24.8 fixed-point) and round the result to the nearest integer." [¶0194] "To quantize an exemplary floating-point value 1512 (fx=3.4667968) having an exponent 1514A (Ex) and a mantissa 1514B (Mx), the mantissa 1514B is right shifted by the difference between the exponent 1514A and the absolute max value exponent to create a magnitude integer 1524 (Ix), with the implicit leading bit 1513 (LB) stored as an explicit bit 1523 within the magnitude integer 1524. The sign bit 1520 (Sx) is maintained for the quantized fixed-point value. The scaled exponent scale factor 1522 (SF) is computed as shown in equation (2) above." Right shifting interpreted as synonymous with removing bits.)
	rounding the mantissa to produce a quantized precision value.([¶0217] "FIG. 17 illustrates floating-point to dynamic fixed-point biased rounding, according to an embodiment. Quantization with biased rounding as illustrated in FIG. 17 is similar to quantization as illustrated in FIG. 15A. Additionally, a round bit 1740 and a bias bit 1742 are used to capture bits that would otherwise be lost during the right shift to generate the integer magnitude value." FIG. 17 explicitly shows the rounding occuring in the mantissa.).

	The combination of Yao and Shin as well as Mellempudi are directed towards accelerating neural networks through quantization.  Therefore, the combination of Yao and Shin as well as Mellempudi are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Yao and Shin with the teachings of Mellempudi by focusing on throughput optimization.  Mellempudi teaches as an additional motivation for combination ([¶0228] “additional operations for the tensor data can be performed in floating-point. Logic 1940 can be used to enable training for a dataset to be performed at least in part in a dynamic fixed-point precision, enabling a performance and efficiency gain during the earlier portion of training, reducing the overall training time for a neural network.”).  This motivation for combination also applies to the remaining claims which depend on this combination.

	 Regarding claim 6, the combination of Yao and Shin teaches The method of claim 4.
	However, the combination of Yao and Shin doesn't explicitly teach, reshaping the input tensor to allow the converting the input tensor to include independent operations on portions of the input tenor.

	Mellempudi, in the same field of endeavor, teaches reshaping the input tensor to allow the converting the input tensor to include independent operations on portions of the input tenor.([¶0217] "FIG. 17 illustrates floating-point to dynamic fixed-point biased rounding, according to an embodiment. Quantization with biased rounding as illustrated in FIG. 17 is similar to quantization as illustrated in FIG. 15A. Additionally, a round bit 1740 and a bias bit 1742 are used to capture bits that would otherwise be lost during the right shift to generate the integer magnitude value." reshaping the input tensor interpreted as quantizing the tensor values.  Converting interpreted as synonymous with casting, independent operations on portions of the input tensor interpreted as synonymous with integer based math or other bit operations such as shifting the fractional bit.).

	The combination of Yao and Shin as well as Mellempudi are directed towards accelerating neural networks through quantization.  Therefore, the combination of Yao and Shin as well as Mellempudi are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Yao and Shin with the teachings of Mellempudi by focusing on throughput optimization.  Mellempudi teaches as an additional motivation for combination ([¶0228] “additional operations for the tensor data can be performed in floating-point. Logic 1940 can be used to enable training for a dataset to be performed at least in part in a dynamic fixed-point precision, enabling a performance and efficiency gain during the earlier portion of training, reducing the overall training time for a neural network.”).  This motivation for combination also applies to the remaining claims which depend on this combination.

Regarding claims 22 and 24-25, claims 22 and 24-25 are directed towards a system for implementing the methods of claims 3 and 5-6, respectively.  Therefore, the rejection applied to claims 3 and 5-6 also applies to claims 22 and 24-25.  Claims 22 and 24-25 also recite additional elements which are addressed by Yao including memory; ([¶0030] " the processor 102 includes cache memory 104. Depending on the architecture, the processor 102 can have a single internal cache or multiple levels of internal cache.")
	one or more processors coupled to the memory; ([¶0029] "the one or more processors 102 each include one or more processor cores 107 to process instructions which, when executed, perform operations for system and user software.")
	one or more computer readable storage media storing computer-readable instructions that when executed by the general-purpose processors or the quantized hardware accelerator, configure the system to perform at least: ([¶0029] "the one or more processors 102 each include one or more processor cores 107 to process instructions which, when executed, perform operations for system and user software.").

	Claims 10 and 27 are rejected under U.S.C. §103 as being unpatentable over the combination of Yao and Shin and Yang (US20180157940A1).

	 Regarding claim 10, the combination of Yao and Shin teaches The method of claim 1.
	However, the combination of Yao and Shin doesn't explicitly teach, the input tensor has two dimensions X and N,
	 the method further comprising: applying a convolution kernel having three dimensions K, N, and P; and flattening the convolution kernel into a two-dimensional matrix having two dimensions K×N and P; and
	converting the input tensor into a matrix having two dimensions K×N and X..

	Yang, in the same field of endeavor, teaches the input tensor has two dimensions X and N,([¶0064] "An input image generally contains a large amount of imagery data. In order to perform image processing operations. The input image 1100 is partitioned into M-pixel by M-pixel blocks 1111-1112 as shown in FIG. 11A." [¶0065] "In another embodiment, the input image is a rectangular shape with dimensions of (2I×M)-pixel and (2J×M)-pixel, where I and J are positive integers.." 2I*M is interpreted as synonymous with X, and 2J*M is interpreted as synonymous with N.)
	 the method further comprising: applying a convolution kernel having three dimensions K, N, and P; and flattening the convolution kernel into a two-dimensional matrix having two dimensions K×N and P; and([¶0060] "After 3×3 convolutions for each group of imagery data are performed for predefined number of filter coefficients, convolution operations results Out(m, n) are sent to the first  memory buffers via another multiplex" [¶0047] "m, n are corresponding row and column values for identifying which imagery data (pixel) within the (M+2)-pixel by (M+2)-pixel region the convolution is performed;" 3x3 convolution interpreted as referring to a three dimensional convolution kernel.  Outputting as two parameters interpreted as flattening the results. See also formula 1 ¶0046)
	converting the input tensor into a matrix having two dimensions K×N and X.([¶0062] "If a 2×2 pooling operation is required, the M×M output results are reduced to (M/2)×(M/2)" See also FIG. 10).

	The combination of Yao and Shin as well as Yang are directed towards neural network accelerators.  Therefore, the combination of Yao and Shin as well as Yang are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Yao and Shin with the teachings of Yang by implementing convolution specific matrix optimizations. Yang teaches as motivation for combination ([¶0004] “The FCN layer is therefore required to project the high dimensional vector to a relatively low dimensional space, e.g., 4096, 1024, or smaller number (e.g., 128). Disadvantage of such a feature extraction is that the huge number of parameters (e.g. more than 100 million (i.e., 25088×4096) for the FCN layer connecting to convolutional layer). As a result, runtime performance is low due to such a high computation complexity.”).  

Regarding claim 27, claim 27 is directed towards a system for implementing the method of claim 10.  Therefore, the rejection applied to claim 10 also applies to claim 27.  Claim 27 also recites additional elements which are addressed by Yao including memory; ([¶0030] " the processor 102 includes cache memory 104. Depending on the architecture, the processor 102 can have a single internal cache or multiple levels of internal cache.")
	one or more processors coupled to the memory; ([¶0029] "the one or more processors 102 each include one or more processor cores 107 to process instructions which, when executed, perform operations for system and user software.")
	one or more computer readable storage media storing computer-readable instructions that when executed by the general-purpose processors or the quantized hardware accelerator, configure the system to perform at least: ([¶0029] "the one or more processors 102 each include one or more processor cores 107 to process instructions which, when executed, perform operations for system and user software.").

	Claims 11 and 28 are rejected under U.S.C. §103 as being unpatentable over the combination of Yao and Shin and Jin (US 2018/0089562 A1).   

	 Regarding claim 11, the combination of Yao and Shin teaches The method of claim 1.
	However, the combination of Yao and Shin doesn't explicitly teach, the input tensor has three dimensions X, Y, and N
	the method further comprising: applying a convolution kernel having four dimensions K, L, N, and P to the input tensor, and
	converting the input tensor into a matrix having two dimensions K×L×N and M..

	Jin, in the same field of endeavor, teaches the input tensor has three dimensions X, Y, and N([¶0046] "Referring to FIG. 2A, input data 200 on the leftmost side may include a plurality of channels. FIG. 2 shows an example in which the input data 200 includes three channels. The input data 200 may be expressed in width, height and depth.")
	the method further comprising: applying a convolution kernel having four dimensions K, L, N, and P to the input tensor, and(FIG. 2B shows that convolution operation is dependent on four dimensional variables a,b,c, and d. [¶0047] "A convolution layer may perform a convolution operation on two weight sets 220 having a size of C×C×D and each of the input data...A convolution operation may be identically performed on all of depths Di (d=0,1,2)")
	converting the input tensor into a matrix having two dimensions K×L×N and M.(FIG. 2B 250 shows pooling layer outputting 2 dimensional representation of input.  Figure 2B explicitly shows that output of one pooling layer can be used for a subsequent pooling layer, see also FIG 1).

	The combination of Yao and Shin as well as Jin are directed towards neural network accelerators.  Therefore, the combination of Yao and Shin as well as Jin are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Yao and Shin with the teachings of Jin by implementing convolution specific optimizations.  Jin teaches as a motivation for combination ([¶0040] “In accordance with an embodiment, the CNN architecture can reduce an operation latency using a drop-out method, e.g., a drop-out method, that is, a regularization method, for improving performance of an algorithm in a fully connected layer.”).

Regarding claim 28, claim 28 is directed towards a system for implementing the method of claim 11.  Therefore, the rejection applied to claim 11 also applies to claim 28.  Claim 11 also recites additional elements which are addressed by Yao including memory; ([¶0030] " the processor 102 includes cache memory 104. Depending on the architecture, the processor 102 can have a single internal cache or multiple levels of internal cache.")
	one or more processors coupled to the memory; ([¶0029] "the one or more processors 102 each include one or more processor cores 107 to process instructions which, when executed, perform operations for system and user software.")
	one or more computer readable storage media storing computer-readable instructions that when executed by the general-purpose processors or the quantized hardware accelerator, configure the system to perform at least: ([¶0029] "the one or more processors 102 each include one or more processor cores 107 to process instructions which, when executed, perform operations for system and user software.").

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124