Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on June 15, 2022, in which claims 1-2, 4-6, 10-12, and 17 are amended. Claims 7-8, 13-16, and 18-20 have been canceled. Claims 21-29 are newly added.  Claims 1-6, 9-12, 17, and 21-29 are currently pending.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on May 20, 2022 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Arguments
Applicant’s arguments with respect to rejection of claims 1-20 under 35 U.S.C. 103 based on amendment have been considered and are persuasive. The argument is moot in view of a new ground of rejection set forth below.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1, 4, 9, 12, 17, 23, and 26 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Yao (US20200380357A1).

	Regarding claim 1, Yao teaches A method, implemented at a computer system, for training a neural network using a quantized precision floating-point format, comprising: ([¶0180] "To address these and other issues, subject matter here describes Incremental Network Quantization (INQ) targeted to convert any pre-trained full-precision (i.e., 32-bit floating-point) DNN model into a lossless low-precision version")
	training, by a general-purpose processor, a first neural network model that is expressed using normal-precision floating-point numbers to create a first trained neural network model; ([¶0162] " A deep belief network (DBN) is a generative neural network that is composed of multiple layers of stochastic (random) variables. DBNs can be trained layer-by-layer using greedy unsupervised learning. The learned weights of the DBN can then be used to provide pre-train neural networks by determining an optimal initial set of weights for the neural network." [¶0180] "To address these and other issues, subject matter here describes Incremental Network Quantization (INQ) targeted to convert any pre-trained full-precision (i.e., 32-bit floating-point) DNN model into a lossless low-precision version" Yao explicitly teaches methods of pre-training a neural network.  Pre-trained full-precision model interpreted as synonymous with a first trained neural network model.)
	quantizing the first trained neural network model to create a first quantized neural network model, including converting a first set of normal-precision floating-point numbers of the first trained neural network model to a corresponding first set of quantized-precision format numbers ([¶0200] " Weight partition is to divide the weights in each layer of a pre-trained full-precision DNN model into two disjoint groups which play complementary roles in INQ. The weights in the first group are responsible for forming a low-precision base for the original model, thus they are quantized by using Equation (4), above. The weights in the second group adapt to compensate for the loss in model accuracy, thus they are the ones to be re-trained. Once the first run of the quantization and re-training operations is finished, all the three operations are further conducted on the second weight group (i.e., latest re-trained weight group) in an iterative manner, until all the weights are converted to be either powers of two or zero, thereby acting as an incremental network quantization and accuracy enhancement procedure." Weights in first group interpreted as first set of quantized-precision format numbers of the first quantized neural network model.)
	determining a loss of the first quantized neural network model, including comparing a first output of the first trained neural network model with a second output of the first quantized neural network model; ([¶0199] "the INQ techniques described herein implement a strategy to suppress resulting quantization loss in model accuracy." [¶0200] "the INQ techniques described herein implement a strategy to suppress resulting quantization loss in model accuracy." See also Equation on ¶0204 for minimization which relies upon determined network loss.  Network loss and quantization loss are both losses of the first quantized neural network model.  While it would be obvious to one of ordinary skill in the art that the loss is determined by comparing a first output with a second output, Yao describes this explicitly at ¶0164.  Quantization loss interpreted as synonymous with comparing the output of the first trained neural network model with the output of the resulting quantized model.)
	and based on the loss of the first quantized neural network model: adjusting the first neural network model to create a second neural network model that is expressed using normal-precision floating-point numbers, including calculating a gradient, and adjusting at least one of a node weight, an input weight, or an activation value based on the gradient; ([¶0206] "Pl is determined at group-wise quantization operation, and the binary matrix Tl acts as a mask which is determined by weight partition operation. Since Pl and are known, the optimization problem of Equation (7) can be solved using a Stochastic Gradient Decent (SGD) method. In one example the update scheme for the re-training may be derived as...where γ is a positive learning rate. Note that the binary matrix Tl forces zero update to the weights that have been quantized. That is, only the weights still keep with floating-point values are updated." Only updating weights that are floating-point values is interpreted as synonymous with adjusting a node weight of the first neural network model (pre-trained full-precision DNN model) to create a second neural network model  (([¶0208] "Incremental Network Quantization Algorithm. Algorithm 1 Incremental network quantization for lossless CNNs with low-precision weights. Input: X: the training data, {Wl : 1 ≤ l ≤ L}: the pre-trained foil-precision CNN model, {σ1, σ2, . . . , σN}: the accumulated portions of weights quantized at iterative steps Output: {Ŵl : 1 ≤ l ≤ L}: the final low-precision model with the weights") based on the loss of the first quantized neural network model. Each subnetwork quantization explicitly taught as a new neural network model different from the “original model” in ¶0182, ¶0185, and ¶0200 of Yao.)
	training the second neural network model create a second trained neural network model; ([¶0200] " Weight partition is to divide the weights in each layer of a pre-trained full-precision DNN model into two disjoint groups which play complementary roles in INQ. The weights in the first group are responsible for forming a low-precision base for the original model, thus they are quantized by using Equation (4), above. The weights in the second group adapt to compensate for the loss in model accuracy, thus they are the ones to be re-trained. Once the first run of the quantization and re-training operations is finished, all the three operations are further conducted on the second weight group (i.e., latest re-trained weight group) in an iterative manner, until all the weights are converted to be either powers of two or zero, thereby acting as an incremental network quantization and accuracy enhancement procedure.")
	and quantizing the second trained neural network model to create a second quantized neural network model, including converting a second set of normal-precision floating-point numbers of the second trained neural network model to a corresponding second set of quantized-precision format numbers. ([¶0203] "At operation 2035 the quantization and retraining operations are repeated until the model weights are fully quantized as powers of two or zero. This is illustrated in the transition between FIG. 21B and FIG. 21C. In FIG. 22, the lower row depicts results from the second, third, and fourth iterations of the INQ." Second iteration of INQ interpreted as synonymous with quantizing the second trained neural network model to create a second quantized neural network model ([¶0208] "Incremental Network Quantization Algorithm. Algorithm 1 Incremental network quantization for lossless CNNs with low-precision weights. Input: X: the training data, {Wl : 1 ≤ l ≤ L}: the pre-trained foil-precision CNN model, {σ1, σ2, . . . , σN}: the accumulated portions of weights quantized at iterative steps Output: {Ŵl : 1 ≤ l ≤ L}: the final low-precision model with the weights").). 

Regarding claim 4, Yao teaches The method of claim 1, further comprising: generating an input tensor of normal-precision floating-point numbers based on training the first neural network model, the first set of normal-precision floating-point numbers representing at least one of edge weights or activation weights for the first neural network model ([¶0141] “Typically, a feedforward network topology includes an input layer and an output layer that are separated by at least one hidden layer. The hidden layer transforms input received by the input layer into a representation that is useful for generating output in the output layer.” [¶0190] "all model parameters are characterized as weights. A pre-trained full-precision (i.e., 32-bit floating-point) DNN model can be represented by (Wl:1≤l≤L), where Wl denotes the weight set of the lth layer, and L denotes the number of learnable layers in the model." Weights of input layer interpreted as synonymous with input tensor of normal precision floating-point numbers. ). 

	Regarding claim 9, Yao teaches The method of claim 1, wherein: the normal-precision floating-point format is one of the following: a 16-bit floating-point format, a 32-bit floating-point format, a 64-bit floating-point format, or an 80-bit floating-point format. ([¶0190] "all model parameters are characterized as weights. A pre-trained full-precision (i.e., 32-bit floating-point) DNN model can be represented by (Wl:1≤l≤L), where Wl denotes the weight set of the lth layer, and L denotes the number of learnable layers in the model."). 

Regarding claim 12, claim 12 is directed towards a system for implementing the method of claim 1.  Therefore, the rejection applied to claim 1 also applies to claim 12.  Claim 12 also recites additional elements which are addressed by Yao including memory; ([¶0030] " the processor 102 includes cache memory 104. Depending on the architecture, the processor 102 can have a single internal cache or multiple levels of internal cache.")
	one or more processors coupled to the memory; ([¶0029] "the one or more processors 102 each include one or more processor cores 107 to process instructions which, when executed, perform operations for system and user software.")
	one or more computer readable storage media storing computer-readable instructions that when executed by the general-purpose processors or the quantized hardware accelerator, configure the system to perform at least: ([¶0029] "the one or more processors 102 each include one or more processor cores 107 to process instructions which, when executed, perform operations for system and user software."). 
Similarly, regarding claims 23 and 26 which are dependent on claim 12.  Claims 23 and 26 effectively mirror claims 4 and 9, respectively, which depend on claim 1.  Therefore, the rejections applied to claims 4 and 9 also apply to claims 23 and 26.  
	
Regarding claim 17, claim 17 is substantially similar to claim 12.  Therefore, the rejection applied to claim 12 also applies to claim 17.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

	Claims 2, 21, and 29 are rejected under 35 U.S.C. 103 as being unpatentable over Yao and in view of Drumond (“End-to-End DNN Training with Block Floating Point Arithmetic”, 2018). 

	Regarding claim 2, Yao teaches The method of claim 1, wherein: the quantized-precision format is a block floating-point format where at least two elements of the set of the first set of quantized-precision format numbers share a common exponent.  

Drumond, in the same field of endeavor, teaches the quantized-precision format is a block floating-point format where at least two elements of the set of the first set of quantized-precision format numbers share a common exponent. (FIG. 1 "A n-element tensor in BFP and FP representations. BFP tensors save space and simplify computations by sharing exponents across tensors." The 10-bit exponent is shared for all n of the quantized precision elements in the tensor.). 

	Drumond and Yao are both directed towards accelerating neural networks by quantization.  Therefore, Drumond and Yao are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Drumond and Yao by using a block-floating point format. It would be obvious to one of ordinary skill in the art that a major advantage to block floating point is the ability to simultaneously reduce memory usage and maintain floating point accuracy in quantization.  This is further reinforced by Drumond ([p. 1 §1] "Signal processing platforms have historically resorted to block floating-point (BFP), whose representation is shown in Figure 1, as a way to optimize for both performance and density. The use of BFP has allowed signal processors to convert common algorithms (e.g., FFT) to dense and parallel integer arithmetic hardware. We observe that BFPs are also likely to be effective in neural networks, increasing the arithmetic density of accelerators and improving the dynamic range of fixed-point-like arithmetic taking the first step towards effective training in dense arithmetic").

Regarding claim 21, claim 21 is directed towards a system for implementing the method of claim 2.  Therefore, the rejection applied to claim 2 also applies to claim 21.  Claim 12 also recites additional elements which are addressed by Yao including memory; ([¶0030] " the processor 102 includes cache memory 104. Depending on the architecture, the processor 102 can have a single internal cache or multiple levels of internal cache.")
	one or more processors coupled to the memory; ([¶0029] "the one or more processors 102 each include one or more processor cores 107 to process instructions which, when executed, perform operations for system and user software.")
	one or more computer readable storage media storing computer-readable instructions that when executed by the general-purpose processors or the quantized hardware accelerator, configure the system to perform at least: ([¶0029] "the one or more processors 102 each include one or more processor cores 107 to process instructions which, when executed, perform operations for system and user software.").

	Regarding claim 29, claim 29 is substantially similar to claim 21.  Therefore, the rejection applied to claim 21 also applies to claim 29.

	Claims 3, 5, 6, 22, 24, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Yao and in view of Mellempudi (US 2018/0322607 A1). 

	Regarding claim 3, Yao teaches The method of claim 1.
	However, Yao does not explicitly teach, wherein: the quantized-precision format is a block floating-point format where at least two block floating- point format numbers in a tile, but not all of two columns, or two rows, of the tile, or two tiles share a common exponent.  

Mellempudi, in the same field of endeavor, teaches The method of claim 1, wherein: the quantized-precision format is a block floating-point format where at least two block floating- point format numbers in a tile, but not all of two columns, or two rows, of the tile, or two tiles share a common exponent. ([¶0188] “Each low-precision tensor may contain a data buffer and associated metadata represented as a data structure. The metadata may contain information pertaining to data type (integer, fixed-point, float or any other custom data type), precision and shared exponent(s)/scaling factor(s) necessary for performing data conversions and arithmetic operations. The data buffer may be stored as one contiguous block or many smaller blocks with as many exponents/scaling factors corresponding to each block.” [¶0237] "The metadata can maintain the exponent scaling factor for each block, as well as the block size for each block. In one embodiment, partitioning with variable block sizes can be performed along all dimensions of the tensor. A data representation can be generated that has variable size blocks in all dimensions of the tensor" FIG. 21A shows at least two rows but not all of the rows share a common exponent.  Block interpreted as synonymous with tile.). 

	Yao and Mellempudi are both directed towards a quantization enabled neural network accelerator.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the accelerators in Yao and Mellempudi by focusing on throughput optimization. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Mellempudi ([¶0228] “additional operations for the tensor data can be performed in floating-point. Logic 1940 can be used to enable training for a dataset to be performed at least in part in a dynamic fixed-point precision, enabling a performance and efficiency gain during the earlier portion of training, reducing the overall training time for a neural network.”).

	Regarding claim 5, Yao teaches The method of claim 4.
	However, Yao does not explicitly teach converting the input tensor to a set of numbers represented in a quantized precision format, including: identifying a shared exponent for a selected at least two elements of the input tensor; 
	scaling values of the input tensor so that the integer portion of the scaled mantissas has a selected number of bits for the quantized precision format; removing fractional bits from the scaled integer portion of the mantissa; and 
	rounding the mantissa to produce a quantized precision value.  

Mellempudi teaches The method of claim 4, further comprising converting the input tensor to a set of numbers represented in a quantized precision format, including: identifying a shared exponent for a selected at least two elements of the input tensor; ([¶0191] "The dynamic fixed-point representation enables an 8×8 tensor 1415 of 32-bit floating-point values 1414 to be stored in an 8×8 tensor 1425 of 16-bit integer values, each associated with an 8-bit shared exponent.")
	scaling values of the input tensor so that the integer portion of the scaled mantissas has a selected number of bits for the quantized precision format; removing fractional bits from the scaled integer portion of the mantissa; and ([¶0189] " To convert from floating-point to traditional fixed-point, one can multiply the floating-point value by 2fb, where fb is the number of fractional bits for the target fixed-point representation (e.g., 28, for 24.8 fixed-point) and round the result to the nearest integer." [¶0194] "To quantize an exemplary floating-point value 1512 (fx=3.4667968) having an exponent 1514A (Ex) and a mantissa 1514B (Mx), the mantissa 1514B is right shifted by the difference between the exponent 1514A and the absolute max value exponent to create a magnitude integer 1524 (Ix), with the implicit leading bit 1513 (LB) stored as an explicit bit 1523 within the magnitude integer 1524. The sign bit 1520 (Sx) is maintained for the quantized fixed-point value. The scaled exponent scale factor 1522 (SF) is computed as shown in equation (2) above." Right shifting interpreted as synonymous with removing bits.)
	rounding the mantissa to produce a quantized precision value. ([¶0217] "FIG. 17 illustrates floating-point to dynamic fixed-point biased rounding, according to an embodiment. Quantization with biased rounding as illustrated in FIG. 17 is similar to quantization as illustrated in FIG. 15A. Additionally, a round bit 1740 and a bias bit 1742 are used to capture bits that would otherwise be lost during the right shift to generate the integer magnitude value." FIG. 17 explicitly shows the rounding occuring in the mantissa.)

	Yao and Mellempudi are both directed towards a quantization enabled neural network accelerator.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the accelerators in Yao and Mellempudi by focusing on throughput optimization. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Mellempudi ([¶0228] “additional operations for the tensor data can be performed in floating-point. Logic 1940 can be used to enable training for a dataset to be performed at least in part in a dynamic fixed-point precision, enabling a performance and efficiency gain during the earlier portion of training, reducing the overall training time for a neural network.”).

	Regarding claim 6, Yao teaches The method of claim 4.
	However, Yao does not explicitly teach reshaping the input tensor to allow the converting the input tensor to include independent operations on portions of the input tenor.  

Mellempudi, in the same field of endeavor, teaches reshaping the input tensor to allow the converting the input tensor to include independent operations on portions of the input tenor. ([¶0217] "FIG. 17 illustrates floating-point to dynamic fixed-point biased rounding, according to an embodiment. Quantization with biased rounding as illustrated in FIG. 17 is similar to quantization as illustrated in FIG. 15A. Additionally, a round bit 1740 and a bias bit 1742 are used to capture bits that would otherwise be lost during the right shift to generate the integer magnitude value." reshaping the input tensor interpreted as quantizing the tensor values.  Converting interpreted as synonymous with casting, independent operations on portions of the input tensor interpreted as synonymous with integer based math or other bit operations such as shifting the fractional bit.). 

	Yao and Mellempudi are both directed towards a quantization enabled neural network accelerator.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the accelerators in Yao and Mellempudi by focusing on throughput optimization. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Mellempudi ([¶0228] “additional operations for the tensor data can be performed in floating-point. Logic 1940 can be used to enable training for a dataset to be performed at least in part in a dynamic fixed-point precision, enabling a performance and efficiency gain during the earlier portion of training, reducing the overall training time for a neural network.”).

Regarding claims 22 and 24-25, claims 22 and 24-25 are directed towards a system for implementing the methods of claims 3 and 5-6, respectively.  Therefore, the rejection applied to claims 3 and 5-6 also applies to claims 22 and 24-25.  Claims 22 and 24-25 also recite additional elements which are addressed by Yao including memory; ([¶0030] " the processor 102 includes cache memory 104. Depending on the architecture, the processor 102 can have a single internal cache or multiple levels of internal cache.")
	one or more processors coupled to the memory; ([¶0029] "the one or more processors 102 each include one or more processor cores 107 to process instructions which, when executed, perform operations for system and user software.")
	one or more computer readable storage media storing computer-readable instructions that when executed by the general-purpose processors or the quantized hardware accelerator, configure the system to perform at least: ([¶0029] "the one or more processors 102 each include one or more processor cores 107 to process instructions which, when executed, perform operations for system and user software.").

	Claims 10 and 27 are rejected under 35 U.S.C. 103 as being unpatentable over Yao and in view of Yang (US20180157940A1). 

	Regarding claim 10, Yao teaches The method of claim 1.
	However, Yao does not explicitly teach, the input tensor has two dimensions X and N, 
	the method further comprising: applying a convolution kernel having three dimensions K, N, and P; and flattening the convolution kernel into a two-dimensional matrix having two dimensions K×N and P; and 
	converting the input tensor into a matrix having two dimensions K×N and X.  

Yang, in the same field of endeavor, teaches input tensor has two dimensions X and N, ([¶0064] "An input image generally contains a large amount of imagery data. In order to perform image processing operations. The input image 1100 is partitioned into M-pixel by M-pixel blocks 1111-1112 as shown in FIG. 11A." [¶0065] "In another embodiment, the input image is a rectangular shape with dimensions of (2I×M)-pixel and (2J×M)-pixel, where I and J are positive integers.." 2I*M is interpreted as synonymous with X, and 2J*M is interpreted as synonymous with N.)
	the method further comprising: applying a convolution kernel having three dimensions K, N, and P; and flattening the convolution kernel into a two-dimensional matrix having two dimensions K×N and P; and ([¶0060] "After 3×3 convolutions for each group of imagery data are performed for predefined number of filter coefficients, convolution operations results Out(m, n) are sent to the first set of memory buffers via another multiplex" [¶0047] "m, n are corresponding row and column numbers for identifying which imagery data (pixel) within the (M+2)-pixel by (M+2)-pixel region the convolution is performed;" 3x3 convolution interpreted as refering to a three dimensional convolution kernel.  Outputting as two parameters interpreted as flattening the results. See also formula 1 ¶0046)
	converting the input tensor into a matrix having two dimensions K×N and X. ([¶0062] "If a 2×2 pooling operation is required, the M×M output results are reduced to (M/2)×(M/2)" See also FIG. 10). 

	Yao and Yang are both directed towards neural network accelerators.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Yao with the teachings of Yang by implementing convolution specific matrix optimizations. Yang teaches as motivation for combination ([¶0004] “The FCN layer is therefore required to project the high dimensional vector to a relatively low dimensional space, e.g., 4096, 1024, or smaller number (e.g., 128). Disadvantage of such a feature extraction is that the huge number of parameters (e.g. more than 100 million (i.e., 25088×4096) for the FCN layer connecting to convolutional layer). As a result, runtime performance is low due to such a high computation complexity.”).

Regarding claim 27, claim 27 is directed towards a system for implementing the method of claim 10.  Therefore, the rejection applied to claim 10 also applies to claim 27.  Claim 27 also recites additional elements which are addressed by Yao including memory; ([¶0030] " the processor 102 includes cache memory 104. Depending on the architecture, the processor 102 can have a single internal cache or multiple levels of internal cache.")
	one or more processors coupled to the memory; ([¶0029] "the one or more processors 102 each include one or more processor cores 107 to process instructions which, when executed, perform operations for system and user software.")
	one or more computer readable storage media storing computer-readable instructions that when executed by the general-purpose processors or the quantized hardware accelerator, configure the system to perform at least: ([¶0029] "the one or more processors 102 each include one or more processor cores 107 to process instructions which, when executed, perform operations for system and user software.").

	Claims 11 and 28 are rejected under 35 U.S.C. 103 as being unpatentable over Yao and in view of Jin (US 2018/0089562 A1).   

	Regarding claim 11, Yao teaches The method of claim 1.
	However, Yao does not explicitly teach, the input tensor has three dimensions X, Y, and N 
	the method further comprising: applying a convolution kernel having four dimensions K, L, N, and P to the input tensor, and 
	converting the input tensor into a matrix having two dimensions K×L×N and M.  

Jin, in the same field of endeavor, teaches the input tensor has three dimensions X, Y, and N ([¶0046] "Referring to FIG. 2A, input data 200 on the leftmost side may include a plurality of channels. FIG. 2 shows an example in which the input data 200 includes three channels. The input data 200 may be expressed in width, height and depth.").
	the method further comprising: applying a convolution kernel having four dimensions K, L, N, and P to the input tensor, and (FIG. 2B shows that convolution operation is dependent on four dimensional variables a,b,c, and d. [¶0047] "A convolution layer may perform a convolution operation on two weight sets 220 having a size of C×C×D and each of the input data...A convolution operation may be identically performed on all of depths Di (d=0,1,2)")
	converting the input tensor into a matrix having two dimensions K×L×N and M. (FIG. 2B 250 shows pooling layer outputting 2 dimensional representation of input. Figure 2B explicitly shows that output of one pooling layer can be used for a subsequent pooling layer, see also FIG 1). 

	Yao and Jin are both directed towards neural network accelerators.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the CNN from Jin with the accelerator in Yao by implementing convolution specific optimizations.  Jin teaches as a motivation for combination ([¶0040] “In accordance with an embodiment, the CNN architecture can reduce an operation latency using a drop-out method, e.g., a drop-out method, that is, a regularization method, for improving performance of an algorithm in a fully connected layer.”).

Regarding claim 28, claim 28 is directed towards a system for implementing the method of claim 11.  Therefore, the rejection applied to claim 28 also applies to claim 28.  Claim 11 also recites additional elements which are addressed by Yao including memory; ([¶0030] " the processor 102 includes cache memory 104. Depending on the architecture, the processor 102 can have a single internal cache or multiple levels of internal cache.")
	one or more processors coupled to the memory; ([¶0029] "the one or more processors 102 each include one or more processor cores 107 to process instructions which, when executed, perform operations for system and user software.")
	one or more computer readable storage media storing computer-readable instructions that when executed by the general-purpose processors or the quantized hardware accelerator, configure the system to perform at least: ([¶0029] "the one or more processors 102 each include one or more processor cores 107 to process instructions which, when executed, perform operations for system and user software.").

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Zhou (“Incremental network quantization: Towards lossless cnns with low-precision weights”, 2017) is considered relevant as it is directed towards quantizing neural networks based on a full precision network and the quantization loss.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720.  The examiner can normally be reached on M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        

/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126