DETAILED ACTION
This office action is in response to the Application No. 16273662 filed on
05/10/2022. Claims 2 and 15 has been cancelled, claims 1, 3, 4-14, 16-31 are presented for examination and are currently pending. Applicant’s arguments have been carefully and respectfully considered.

Response to Arguments
2.	The applicant agrees on page 13 of the Remarks that primary reference Lin teaches scaling in the current layer by arguing that “The scaling discussed in equation 15 is thus with respect to this floating point to fixed point conversion with quantization in the current layer” 
	However, the Applicant disagrees with the rejection by arguing that “Rather, the Office is relying on a pooling operation of a max pooling layer of Brothers, which is a subsequent layer to the current layer of Lin or Lin NPL”.
	The arguments of the Applicant above are not persuasive because Brothers uses the alternative language “OR” when referring to mathematical operations that can be either performed in the activation function stage (in the current convolution layer involving convolution circuit 115 → accumulator circuit 125 → activation circuit 130, see Figures 1 and 2) or pooling and sub-sampling (PSS) circuit 135. 
	For instance, the cited portion of Brothers by the Applicant on page 14 of the Remarks, third paragraph reads: “The following operations relating to mask generation may be performed by NN engine 100 as part of the activation function stage or as part of the pooling and sub-sampling stage” (Brothers [0050]). This clearly explains that operations referring to the pooling and sub-sampling (PSS) circuit 135 also applies to activation function stage (in the current convolution layer involving convolution circuit 115 → accumulator circuit 125 → activation circuit 130, see Fig. 1 and 2).
	Furthermore, the cited portion of Brothers by the Applicant on page 14, second paragraph reads, “In another embodiment, PSS circuit 135 applies suppression. For example during pooling,...By performing suppression during pooling, the quantization logic takes into account… ” (Brothers [0043]). However, suppression operation in Brothers performed by the pooling and sub-sampling (PSS) circuit 135 can also be performed by the activation function stage (in the current convolution layer involving convolution circuit 115 → accumulator circuit 125 → activation circuit 130, see Fig. 1 and 2). For example, Brothers teaches, In one embodiment, activation circuit 130 is configured to apply suppression by quantizing received inputs [0041]. Suppression may be performed using thresholding by either activation circuit 130 or PSS circuit 135 [0044]. As a result, the arguments by the Applicant are not persuasive because “A reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art, including nonpreferred embodiments” (MPEP 2123 (II)).

	The Applicant has also argued in page 14 of the Remarks that “Thus the discussions of any maximum values in Brothers, as relied upon by the Office only involve a potential max pooling operation of a pooling layer distinct from a current convolution layer”. 
	The argument above is not persuasive because Brothers clearly refers to activation function stage (in the current convolution layer involving convolution circuit 115 → accumulator circuit 125 → activation circuit 130, see Fig. 1 and 2) when discussing maximum values. For instance, Brothers teaches, in another example, whenever the input to the activation function is above an upper suppression threshold, activation circuit 130 causes the output of the activation function to be a maximum value (“MAX_VALUE”) such as all ones. In one exemplary embodiment, activation circuit 130 may implement a piecewise linear approximator implementation of the activation function so that thresholding, or quantization, may be applied with little to no cost [0041]
The Applicant has also argued on page 14 of the Remarks that, “Likewise, the pooling layer of Lin occurs in a separate subsequent layer than the CNN layer of Lin, such as with respect to the disclosure of Lin in paragraphs [0031-0032], [0049], [0051], and [0053]”.
	The argument above is not persuasive because the teachings of primary reference Lin regarding the convolutional layer were referred to in Office Action, rather than the pooling layer being argued by the Applicant. For instance, the Office Action referred to Lin’s “Figs 5A and 5B illustrate distributions of activation values and weights in different layers of an exemplary deep convolution network. Fig. 5A shows the activation values for convolutional layers zero to five (conv0, …conv5)”. 
	In the last paragraph of page 15, the Applicant also argues, “Accordingly, the pooling discussed in Brothers is not an operation related to either zero mean adjustment (based on the mean and standard deviation of data input to a current layer) using the bias of a convolution layer of Lin, or any quantization performed with conversion of floating point values to fixed point values within the Office relied upon current layer of Lin based on standard deviation and mean of the data of the current layer of Lin”. It is respectfully submitted that the pooling of Brothers is not relevant to the Office relied upon features of Lin (or Lin and Lin NPL) that the office asserts correspond to the claimed “determining a lightweight format for the output maps of the current layer based on a distribution of at least a portion of activation data being processed in the neural network,” of independent claim 1.
	The arguments above are not persuasive because Lin modified by Brothers reads on the instant claims because a person of ordinary skill in the art would have modified the convolutional layer of Lin as primary reference with the activation function stage (in the current convolution layer involving convolution circuit 115 → accumulator circuit 125 → activation circuit 130, see Fig. 1 and 2). This is because Brothers teaches an activation circuit 130 as an alternative by disclosing “the following operations relating to mask generation may be performed by NN engine 100 as part of the activation function stage or as part of the pooling and sub-sampling stage” [0050]
	On page 16, the Applicant argues “Still further, for the proposed reason for modifying the combination of Lin and Lin NPL, the Office sets forth that "[it] would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Lin to incorporate the method of Brothers for the benefit of performance of the neural network in terms of processing speed can be increased, while power consumption of the neural network is reduced (Brothers, [0029]).” On page 17, the Applicant argues, “Accordingly, the Office Action relied upon 'benefit' or 'reason' of/from Brothers is not with respect to the relied upon max pooling of Brothers, i.e., the Office Action stated reason for modifying the combination of Lin and Lin NPL to purportedly incorporate the max value consideration during the max pooling of Brothers is not related to any feature of Brothers regarding the max value consideration during the max pooling of Brothers, i.e., Office Action relied upon reason involves the suppression operations of paragraph [00291 of Brothers, and does not relate to the Office Action relied upon max value pooling performed by the pooling layer of Brothers.”
	The above argument is not persuasive because “the test for obviousness is not whether the features of a secondary reference (Brothers) maybe bodily incorporated into the structure of the primary reference (Lin). Rather, the test is what the combined teachings of those references would have suggested to those of ordinary skill in the art” (MPEP 2145 (III)).
	On the last paragraph of page 17, the Applicant argues “Accordingly, it is both respectfully submitted that the Office Action's proposed modification of the combination of Lin and Lin NPL, based on Brothers, would not result in the17 Application No. 16/273,662 Docket No. 012052.1682combination the Office relies upon to reject features of cancelled dependent claim 2, now incorporated into independent claim 1, and Office suggested modification of Lin and Lin NPL based on the Office Action's proposed features of Brothers to reject cancelled dependent claim 
2 would not have been obvious because Lin already has a pooling layer, e.g., a max pooling layer, and because the stated reason for the modification is not related to the max pooling of Brothers.” 
	The argument above is not persuasive because “it is not necessary that the prior art (Brothers) suggest the combination to achieve the same advantage or result discovered by the Applicant” (MPEP 2144)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


3.	Claims 1, 3, 4, 6-8, 10-14, 16, 18-20, 27 and 28 are rejected under 35 U.S.C 103 as being unpatentable over Lin et al. (US20160328646) in view of Lin et al. (Fixed Point Quantization of Deep Convolutional Networks, Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 48, hereinafter “Lin NPL”) and further in view of Brothers et al. (US20160358069)

	Regarding claim 1, Lin teaches a processing method using a neural network (Deep convolutional Networks (DCNs) can be trained using supervised learning in which both the input and output targets are known for many exemplars and are used to modify the weights of the network by use of gradient descent methods [0049]; using Deep belief networks (DBNs) which are probabilistic models comprising multiple layers of hidden nodes… to extract a hierarchical representation of training data sets [0048]), comprising: 
	generating output maps of a current layer of the neural network (The outputs of the convolutional connections may be considered to form a feature map layer 318 (as current layer) [0051] Fig. 3A) 
	by performing a convolution operation between input maps of the current layer and weight kernels of the current layer (FIGS. 5A and 5B illustrate distributions of activation values and weights in different layers of an exemplary deep convolutional network. FIG. 5A shows the activation values for convolution layers zero to five (conv0, …, conv5) …, Accordingly, aspects of the present disclosure are directed toward modifications of the weight and/or activation value calculations so that the distributions have a zero mean (e.g., an approximately zero mean). [0064]);
	determining a lightweight format (the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082]) 
	based on a distribution of at least a portion of activation data being processed in the neural network (FIG. 5A illustrate distributions of activation values in different layers (being processed) in an exemplary deep convolutional network… FIG. 5A shows the activation values for convolution layers zero to five (conv0. . . conv5) and fully connected layers one and two (fc1, fc2) … Accordingly, aspects of the present disclosure are directed toward modifications of the weight and/or activation value calculations so that the distributions have a zero mean (e.g., an approximately zero mean). [0064]); 
	including determining the lightweight format (the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082])
	and lightening activation data corresponding to the output maps of the current layer to have a low bit width (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] (shows scaling data from a high bit width into data with a low bit width); Application of quantization to the weights, biases, and activation values in artificial neural networks includes the determination of a step size. For example, the step sizes of a symmetric uniform quantizer for Gaussian, Laplacian, and Gamma distributions may be calculated with a deterministic function of the standard deviation of the input distribution, if it is assumed that the distributions have zero mean and unit variance [0064]) 
	based on the determined lightweight format (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081]) 
	Lin does not explicitly teach determining a lightweight format for the output maps of the current layer; based on a maximum value of the output maps of the current layer.
	Lin NPL teaches determining a lightweight format for the output maps of the current layer (For a given layer of DCN the goal of conversion is to represent the input activations, the output activations, and the parameters of that layer in fixed point, pg. 2, 3. Floating point to fixed point conversion)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Lin to incorporate the method of Lin NPL for the benefit of offering an analytical solution for bit-width choice per layer to optimize the SQNR for the network (Lin NPL, pg. 2 right col., first para.)
	Brothers teaches based on a maximum value of the output maps of the current layer (For example, given a 16×16 output feature map, PSS circuit 135 may take the maximum value at each 2×2 portion and sub-sample down to an 8×8 output feature map that may be written to memory 145 as an output feature map 150 [0042], in the context of processing input feature maps of a neural network to generate output feature maps to be consumed by a next layer of the neural network [0032])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Lin to incorporate the method of Brothers for the benefit of performance of the neural network in terms of processing speed can be increased, while power consumption of the neural network is reduced (Brothers, [0029])

	Regarding claim 3, Lin modified by Lin NPL teaches the processing method of claim 1, Lin teaches wherein the lightening comprises: lightening, to have the low bit width (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] shows scaling data from a high bit width into data with a low bit width); For example, if |μ|<σ, the increase in the specified bit-width is less than 1 extra bit (which implies low bit width) for each quantity being quantized [0068]),
	input maps of a subsequent layer of the neural network corresponding to the output maps of the current layer (The outputs of the convolutional connections may be considered to form a feature map in the subsequent layer 318 and 320, with each element of the feature map (e.g., 320) receiving input from a range of neurons in the previous layer (e.g., 318) [0051]), 
	based on the determined lightweight format (based on modified input distribution 650 having the zero mean (as lightweight format) [0068])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Lin to incorporate the method of Lin NPL for the benefit of offering an analytical solution for bit-width choice per layer to optimize the SQNR for the network (Lin NPL, pg. 2 right col., first para.)

	Regarding claim 4, Lin modified by Lin NPL teaches the processing method of claim 1, Lin teaches, wherein the lightening comprises: lightening, to have the low bit width (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] shows scaling data from a high bit width into data with a low bit width); For example, if |μ|<σ, the increase in the specified bit-width is less than 1 extra bit (which implies low bit width) for each quantity being quantized [0068]),
	input maps of a subsequent layer of the neural network corresponding to the output maps of the current layer by performing a shift operation on the input maps of the subsequent layer (When the activations throughout the network are shifted to create a zero-mean distribution for each layer, if a subsequent non-linear function is applied to the activations, the coordinate of the function is shifted by the same amount such that the output is a shift of the original output without a bias modification [0077]) 
	using a value corresponding to the determined lightweight format (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081])

	Regarding claim 6, Modified Lin teaches the processing method of claim 1, Lin teaches output maps of a subsequent layer of the neural network (The outputs of the convolutional connections may be considered to form a feature map (which implies input convolves with weight kernel to obtain output feature map) in the subsequent layer 318 and 320, with each element of the feature map (e.g., 320) receiving input from a range of neurons in the previous layer (e.g., 318) and from each of the multiple channels [0051])
	determining a lightweight format for the output maps of the subsequent layer (Furthermore, the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082])
	Brothers teaches wherein the maximum value of the output maps of the current layer is a predicted maximum value of output maps of a subsequent layer of the neural network (activation circuit 130 causes the output of the activation function to be a maximum value (“MAX_VALUE”) such as all ones.[ 0041]) and 
	based on the predicted maximum value (activation circuit 130 causes the output of the activation function to be a maximum value (“MAX_VALUE”) such as all ones.[ 0041])
	The same motivation to combine independent claim 1 applies here.

	Regarding claim 7, Lin modified by Lin NPL teaches the processing method of claim 1, Lin teaches wherein the lightening comprises: lightening, to have the low bit width Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] shows scaling data from a high bit width into data with a low bit width); For example, if |μ|<σ, the increase in the specified bit-width is less than 1 extra bit (which implies low bit width) for each quantity being quantized [0068]), 
	the output maps of the current layer based on the determined lightweight format (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081])

	Regarding claim 8, Lin modified by Lin NPL teaches the processing method of claim 1, Lin teaches wherein the lightening comprises: lightening, to have the low bit width (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] shows scaling data from a high bit width into data with a low bit width); For example, if |μ|<σ, the increase in the specified bit-width is less than 1 extra bit (which implies low bit width) for each quantity being quantized [0068])
	by performing a shift operation on the output maps of the current layer. (When the activations throughout the network are shifted to create a zero-mean distribution for each layer, if a subsequent non-linear function is applied to the activations, the coordinate of the function is shifted by the same amount such that the output is a shift of the original output without a bias modification [0077])	
	using a value corresponding to the determined lightweight format (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081]).
	Modified Lin does not explicitly teach the output maps of the current layer have a high bit width 
	Brothers teaches the output maps of the current layer with a high bit width (For example, while a non-zero weight is 8 bits in width (as low bit width), data signal may be 128 bits in width (as high bit width) allowing the weight to be sent to more than one location [0085]) 
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Lin to incorporate the method of Brothers for the benefit of performance of the neural network in terms of processing speed can be increased, while power consumption of the neural network is reduced (Brothers, [0029])

	Regarding claim 9, Modified Lin teaches the processing method of claim 6, Lin teaches output maps of a subsequent layer of the neural network (The outputs of the convolutional connections may be considered to form a feature map (which implies input convolves with weight kernel to obtain output feature map) in the subsequent layer 318 and 320, with each element of the feature map (e.g., 320) receiving input from a range of neurons in the previous layer (e.g., 318) and from each of the multiple channels [0051])
	Brothers teaches further comprising: 4Application No. 16/273,662Docket No. 012052.1682 updating a register, configured to store a value, (the NN engine loads a first 4×4 region of an input feature map from memory into data registers for processing [0112]) 
	to be the maximum value of the output maps of the current layer generated by the convolution operation, (Accumulator circuit 125 may receive outputs from convolution circuit 115 and bypass processing circuit 120 [0040]; In another example, whenever the input to the activation function is above an upper suppression threshold, activation circuit 130 causes the output of the activation function to be a maximum value (“MAX_VALUE”) such as all ones [0041], Fig. 1)
	wherein a maximum value of output maps of a subsequent layer of the neural network is predicted based on a value stored in the updated register. (In another example, whenever the input to the activation function is above an upper suppression threshold, activation circuit 130 causes the output of the activation function to be a maximum value (“MAX_VALUE”) such as all ones [0041]; the NN engine loads a first 4×4 region of an input feature map from memory into data registers for processing [0112])
	The same motivation to combine dependent claim 8 applies here.

	Regarding claim 10, Modified Lin teaches the processing method of claim 1, further comprising: Lin teaches obtaining a first weight kernel corresponding to a first output channel that is currently being processed in the current layer by referring to a database including weight kernels by each layer and output channel, (A DCN may be trained with supervised learning. During training, a DCN may be presented with an image, such as a cropped image of a speed limit sign, and a “forward pass” may then be computed to produce an output 322 [0044]; 5B illustrate distributions of weights in different layers of an exemplary deep convolutional network. [0021]; Aspects of the present disclosure are directed to improving quantization of the weights, biases, and/or activations in ANNs.[0033]; Variables (e.g., neural signals and synaptic weights), system parameters associated with a computational device (e.g., neural network with weights) [0034];  At the top layer, the gradient may correspond directly to the value of a weight connecting an activated neuron in the penultimate layer and a neuron in the output layer. In lower layers, the gradient may depend on the value of the weights and on the computed error gradients of the higher layers [0045])
	wherein the generating of the output maps of the current layer comprises: generating a first output map corresponding to the first output channel by performing a convolution operation between the input maps of the current layer and the first weight kernel, (The outputs of the convolutional connections may be considered to form a feature map layer 318 and 320 [0051] Fig. 3A.; 5B illustrate distributions of weights in different layers of an exemplary deep convolutional network [0021]. The Examiner notes that first output map is from convolution operation which form a feature map in the subsequent layer)
	wherein the database further includes respective lightweight formats for plural output channels of at least one layer of the neural network. (In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9. [0081]; looking up (e.g., looking up in a table, a database or another data structure) [0087])

	Regarding claim 11, Modified Lin teaches the processing method of claim 10, Lin teaches wherein the first weight kernel is determined independently from a second weight kernel corresponding to a second output channel of the current layer, (5B illustrate distributions weights in different layers of an exemplary deep convolutional network. [0021]; FIG. 3B is a block diagram illustrating a deep convolutional network 350. The deep convolutional network 350 may include multiple different types of layers based on connectivity and weight sharing [0053]; FIG. 5B shows the weights 550 for convolution layers one to five (conv1, . . . , conv5) and fully connected layers one and two (fc1, fc2). [0064]) and 
	wherein the determining of the lightweight format includes determining respective lightweight formats for each of multiple output maps of the output maps of the current layer, (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081]; ( a neuron in a first layer may communicate its output to every neuron in a second layer, so that each neuron in the second layer will receive input from every neuron in the first layer [0042])
	includes determining a lightweight format for the first output map independently of a determining of a lightweight format for a second output map of the output maps of the current layer. (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081]; the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082]; The outputs of the convolutional connections may be considered to form a feature map layer 318 and 320 [0051] Fig. 3A). The Examiner notes that second output map is from convolution operation C1 which form a feature map in the subsequent layer 320)

	Regarding claim 12, Lin modified by Lin NPL teaches the processing method of claim 1, Lin teaches wherein the input maps of the current layer and the weight kernels of the current layer (convolution layer may include one or more convolutional filters (also known as kernels), which may be applied to the input data to generate a feature map (as output map of current layer) [0053] and between each layer of the deep convolutional network 350 are weights (not shown) that are to be updated [0055]) 
	have the low bit width (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] shows scaling data from a high bit width into data with a low bit width),
	Modified Lin does not explicitly teach the output maps of the current layer have a high bit width 
	Brothers teaches the output maps of the current layer have a high bit width (For example, while a non-zero weight is 8 bits in width (as low bit width), data signal may be 128 bits in width (as high bit width) allowing the weight to be sent to more than one location [0085])
	The same motivation to combine dependent claim 2 applies here.

	Regarding claim 13, Lin modified by Lin NPL teaches the processing method of claim 1, Lin teaches a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform (a computer program product may comprise a computer-readable medium having instructions stored (and/or encoded) thereon, the instructions being executable by one or more processors to perform the operations described herein [0098])

	Regarding claim 14, Lin teaches a processing apparatus using a neural network, comprising: a processor; and a memory including an instruction readable by the processor, wherein, when the instruction is executed by the processor, the processor is configured to: (a computer program product may comprise a computer-readable medium having instructions stored (and/or encoded) thereon, the instructions being executable by one or more processors to perform the operations described herein [0098])
	generate output maps of a current layer of the neural network (The outputs of the convolutional connections may be considered to form a feature map layer 318 (as current layer) [0051] Fig. 3A) 
	by performing a convolution operation between input maps of the current layer and weight kernels of the current layer ((FIGS. 5A and 5B illustrate distributions of activation values and weights in different layers of an exemplary deep convolutional network. FIG. 5A shows the activation values for convolution layers zero to five (conv0, …, conv5) ….., Accordingly, aspects of the present disclosure are directed toward modifications of the weight and/or activation value calculations so that the distributions have a zero mean (e.g., an approximately zero mean). [0064]);
	including determination of the lightweight format (the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082]) 
	based on a distribution of at least a portion of activation data being processed in the neural network (FIG. 5A illustrate distributions of activation values in different layers (being processed) in an exemplary deep convolutional network… FIG. 5A shows the activation values for convolution layers zero to five (conv0. . . conv5) and fully connected layers one and two (fc1, fc2) … Accordingly, aspects of the present disclosure are directed toward modifications of the weight and/or activation value calculations so that the distributions have a zero mean (e.g., an approximately zero mean)[0064]);
	and lighten activation data corresponding to the output maps of the current layer to have a low bit width (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] (shows scaling data from a high bit width into data with a low bit width); Application of quantization to the weights, biases, and activation values in artificial neural networks includes the determination of a step size. For example, the step sizes of a symmetric uniform quantizer for Gaussian, Laplacian, and Gamma distributions may be calculated with a deterministic function of the standard deviation of the input distribution, if it is assumed that the distributions have zero mean and unit variance [0064]) 
	based on the determined lightweight format (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081])
	Lin does not explicitly teach determine a lightweight format for the output maps of the current layer.
	Lin NPL teaches determine a lightweight format for the output maps of the current layer (For a given layer of DCN the goal of conversion is to represent the input activations, the output activations, and the parameters of that layer in fixed point, pg. 2, 3. Floating point to fixed point conversion)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Lin to incorporate the method of Lin NPL for the benefit of offering an analytical solution for bit-width choice per layer to optimize the SQNR for the network (Lin NPL, pg. 2 right col., first para.)
	Brothers teaches based on a maximum value of the output maps of the current layer (For example, given a 16×16 output feature map, PSS circuit 135 may take the maximum value at each 2×2 portion and sub-sample down to an 8×8 output feature map that may be written to memory 145 as an output feature map 150 [0042], in the context of processing input feature maps of a neural network to generate output feature maps to be consumed by a next layer of the neural network [0032])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Lin to incorporate the method of Brothers for the benefit of performance of the neural network in terms of processing speed can be increased, while power consumption of the neural network is reduced (Brothers, [0029])

	Regarding claim 16, Lin modified by Lin NPL teaches the processing apparatus of claim 14, Lin teaches wherein the processor is configured to: obtain input maps of a subsequent layer of the neural network based on the output maps of the current layer, ( a neuron in a first layer may communicate its output to every neuron in a second layer, so that each neuron in the second layer will receive input from every neuron in the first layer [0042])
	and lightens input maps of a subsequent layer to have the low bit width (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] shows scaling data from a high bit width into data with a low bit width); For example, if |μ|<σ, the increase in the specified bit-width is less than 1 extra bit (which implies low bit width) for each quantity being quantized [0068]);The outputs of the convolutional connections may be considered to form a feature map in the subsequent layer 318 and 320, with each element of the feature map (e.g., 320) receiving input from a range of neurons in the previous layer (e.g., 318) [0051]), 
	based on the determined lightweight format (based on modified input distribution 650 having the zero mean (as lightweight format) [0068])

	Regarding claim 18, Modified Lin teaches the processing apparatus of claim 14, Brothers teaches wherein the maximum value of the output maps of the current layer is considered a predicted maximum value of output maps of a subsequent layer of the neural network, (Accumulator circuit 125 may receive outputs from convolution circuit 115 [0040]; In another example, whenever the input to the activation function is above an upper suppression threshold, activation circuit 130 causes the output of the activation function to be a maximum value (“MAX_VALUE”) such as all ones [0041]; For example, given a 16×16 output feature map, PSS circuit 135 may take the maximum value at each 2×2 portion and sub-sample down to an 8×8 output feature map [0042]. The Examiner notes that the output from convolution circuit, the current layer, is the predicted output of the PSS circuit which is the subsequent layer) and 
	wherein, the processor is configured to: determine a lightweight format for the output maps of the subsequent layer based on the predicted maximum value (Furthermore, the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082]; For example, given a 16×16 output feature map, PSS circuit 135 may take the maximum value at each 2×2 portion and sub-sample down to an 8×8 output feature map [0042]; in the context of processing input feature maps of a neural network to generate output feature maps to be consumed by a next layer of the neural network [0032]. The Examiner notes that the subsequent layer is from PSS circuit)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Lin to incorporate the method of Brothers for the benefit of performance of the neural network in terms of processing speed can be increased, while power consumption of the neural network is reduced (Brothers, [0029])

	Regarding claim 19, Lin modified by Lin NPL teaches the processing apparatus of claim 14, Lin teaches wherein the processor is configured to: lighten the output maps of the current layer to have the low bit width (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] (shows scaling data from a high bit width into data with a low bit width); Application of quantization to the weights, biases, and activation values in artificial neural networks includes the determination of a step size. For example, the step sizes of a symmetric uniform quantizer for Gaussian, Laplacian, and Gamma distributions may be calculated with a deterministic function of the standard deviation of the input distribution, if it is assumed that the distributions have zero mean and unit variance [0064]) 
	based on the determined lightweight format (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081])

	Regarding claim 20, Lin modified by Lin NPL teaches the processing apparatus of claim 14, Lin teaches wherein the processor is configured to: lighten, to have the low bit width (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] shows scaling data from a high bit width into data with a low bit width); For example, if |μ|<σ, the increase in the specified bit-width is less than 1 extra bit (which implies low bit width) for each quantity being quantized [0068]) 
	by performing a shift operation on the output maps of the current layer (When the activations throughout the network are shifted to create a zero-mean distribution for each layer, if a subsequent non-linear function is applied to the activations, the coordinate of the function is shifted by the same amount such that the output is a shift of the original output without a bias modification [0077])	
	using a value corresponding to the determined lightweight format (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081]).
	Modified Lin does not explicitly teach the output maps of the current layer have a high bit width 
	Brothers teaches the output maps of the current layer with a high bit width (For example, while a non-zero weight is 8 bits in width (as low bit width), data signal may be 128 bits in width (as high bit width) allowing the weight to be sent to more than one location [0085]) 
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Lin to incorporate the method of Brothers for the benefit of performance of the neural network in terms of processing speed can be increased, while power consumption of the neural network is reduced (Brothers, [0029])

	Regarding claim 27, Lin teaches the processing method of claim 1, Lin teaches wherein the determining of the lightweight format comprises determining a lightweight format (the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082]) 
	for a first output map of the output maps independent of a determining of a second lightweight format for a second output map of the output maps, (The outputs of the convolutional connections may be considered to form a feature map layer 318 and 320 [0051] Fig. 3A.. The Examiner notes that first output map is from convolution operation which form a feature map in the subsequent layer) and 
	wherein the lightening of the activation data comprises: lightening of first activation data, corresponding to the first output map, (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] (shows scaling data from a high bit width into data with a low bit width); FIG. 5A shows the activation values for convolution layers zero to five (conv0, . . . , conv5),…. Application of quantization to the weights, biases, and activation values in artificial neural networks includes the determination of a step size. For example, the step sizes of a symmetric uniform quantizer for Gaussian, Laplacian, and Gamma distributions may be calculated with a deterministic function of the standard deviation of the input distribution, if it is assumed that the distributions have zero mean and unit variance [0064]; The Examiner notes that fig. 5A, conv1 shows the first activation data) 
 	based on the first lightweight format; (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081]. The Examiner notes that Q 3.18 is the first lightweight format) and 
	lightening of second activation data, corresponding to the second output map, based on the second lightweight format. (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] (shows scaling data from a high bit width into data with a low bit width); FIG. 5A shows the activation values for convolution layers zero to five (conv0, . . . , conv5) [0064]; Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081]; The outputs of the convolutional connections may be considered to form a feature map layer 318 and 320 [0051] Fig. 3A). The Examiner notes that second output map is from convolution operation C1 which form a feature map in the subsequent layer 320, also Q 6.9 is the second lightweight format and fig. 5A, conv2 shows the second activation data)
	Regarding claim 28, Lin teaches the processing method of claim 1, Lin teaches wherein the lightening comprises lightening a first activation data corresponding to a first output map of the multiple output maps to have a first resolution of the lightened first activation data (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] (shows scaling data from a high bit width into data with a low bit width); FIG. 5A shows the activation values for convolution layers zero to five (conv0, . . . , conv5),…. Application of quantization to the weights, biases, and activation values in artificial neural networks includes the determination of a step size. For example, the step sizes of a symmetric uniform quantizer for Gaussian, Laplacian, and Gamma distributions may be calculated with a deterministic function of the standard deviation of the input distribution, if it is assumed that the distributions have zero mean and unit variance [0064]; The Examiner notes that fig. 5A, conv1 shows the first activation data) 
	that is determined independent of a determination of a resolution of lightened second activation data corresponding to a second output map of the multiple output maps. (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] (shows scaling data from a high bit width into data with a low bit width); FIG. 5A shows the activation values for convolution layers zero to five (conv0, . . .  conv5) [0064]; Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081]; The outputs of the convolutional connections may be considered to form a feature map layer 318 and 320 [0051] Fig. 3A). The Examiner notes that second output map is from convolution operation C1 which form a feature map in the subsequent layer 320, also Q 6.9 is the second lightweight format and fig. 5A, conv2 shows the second activation data)

4.	Claims 5, 17 and 29 is rejected under 35 U.S.C. 103 as being unpatentable over Lin et al. (US20160328646) in view of Brothers et al. (US20160358069) and further in view of Du et al. (US20180232629)

	Regarding claim 5, Modified Lin teaches the processing method of claim 1, Brothers teaches further comprising: selectively updating a value stored in a register, corresponding to a previous maximum value of output maps of a previous layer of the neural network, (the NN engine loads a first 4×4 region of an input feature map from memory into data registers for processing [0112]; In another example, whenever the input to the activation function is above an upper suppression threshold, activation circuit 130 causes the output of the activation function to be a maximum value (“MAX_VALUE”) such as all ones [0041]; In block 335, the NN engine may output results for the current region to the activation circuit(s) and the PSS circuit(s). In block 340, the NN engine may determine whether the last region for the current feature map in layer N has been generated. If so, method 300 may end. If not, method 300 continues to block 345 where the NN engine moves to the next region for the current feature map in layer N [0072]) 
	wherein the determining of the lightweight format includes determining the lightweight format (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081]).
	 Modified Lin does not explicitly teach based on a value stored in the register after the selective updating and based on a comparing of the maximum value to the value stored in the register to the maximum value of the output maps of the current layer, 
	Brothers teaches based on a value stored in the register after the selective updating (the NN engine loads a first 4×4 region of an input feature map from memory into data registers for processing [0112])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Lin to incorporate the method of Brothers for the benefit of performance of the neural network in terms of processing speed can be increased, while power consumption of the neural network is reduced (Brothers, [0029])
	Du teaches based on a comparing of the maximum value to the value stored in the register to the maximum value of the output maps of the current layer, (In this embodiment, the sum buffer unit 5 includes a plurality of pooling units 52, and each pooling unit 52 includes a register set REG, a comparator COMP, and an output switch, … The register set REG has four registers, which can output the stored data to the comparator COMP. Three of the registers can receive and store the data read from the convolution operation module 3 or the memory 1, and the residual register can receive the output of the comparator COMP and store the maximum value of the outputs of the comparator COMP. The comparator COMP can compare the three inputted data and the maximum value of the previous comparison so as to output the maximum value. In other words, the maximum value outputted by the comparator COMP in the previous clock is registered in the register, [0060]; The convolutional neural network may include a plurality of convolution layers and a plurality of pooling layers. The output of each layer can be the input of another layer or a consecutive layer. For example, the output of the Nth convolution layer can be the input of the Nth pooling layer or another consecutive layer, the output of the Nth pooling layer can be the input of the (N+1)th pooling layer or another consecutive layer [036]) 
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Lin to incorporate the method of Du for the benefit of calculating the convolution multiplication of the next convolution layer (Du, [0037])

	Regarding claim 17, Modified Lin teaches the processing apparatus of claim 14, Lin teaches wherein the determination of the lightweight format (the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082]; Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] (shows scaling data from a high bit width into data with a low bit width);) 
	wherein the determination of the lightweight format (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081])
	Brothers teaches based on the maximum value of the output maps of the current layer is a consideration of the maximum value of the output maps of the current layer, ((In another example, whenever the input to the activation function is above an upper suppression threshold, activation circuit 130 causes the output of the activation function to be a maximum value (“MAX_VALUE”) such as all ones [0041]; In block 335, the NN engine may output results for the current region to the activation circuit(s) and the PSS circuit(s). In block 340, the NN engine may determine whether the last region for the current feature map in layer N has been generated. If so, method 300 may end. If not, method 300 continues to block 345 where the NN engine moves to the next region for the current feature map in layer N [0072])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Lin to incorporate the method of Brothers for the benefit of performance of the neural network in terms of processing speed can be increased, while power consumption of the neural network is reduced (Brothers, [0029])
	Modified Lin does not explicitly teach based on a result of the comparison of the value stored in the register to the maximum value of the output maps of the current layer, including a comparison of a value stored in a register, corresponding to a maximum value of output maps of a previous layer of the neural network, to the maximum value of the output maps of the current layer, 
	Du teaches based on a result of the comparison of the value stored in the register to the maximum value of the output maps of the current layer, including a comparison of a value stored in a register, corresponding to a maximum value of output maps of a previous layer of the neural network, to the maximum value of the output maps of the current layer (In this embodiment, the sum buffer unit 5 includes a plurality of pooling units 52, and each pooling unit 52 includes a register set REG, a comparator COMP, and an output switch, … The register set REG has four registers, which can output the stored data to the comparator COMP. Three of the registers can receive and store the data read from the convolution operation module 3 or the memory 1, and the residual register can receive the output of the comparator COMP and store the maximum value of the outputs of the comparator COMP. The comparator COMP can compare the three inputted data and the maximum value of the previous comparison so as to output the maximum value. In other words, the maximum value outputted by the comparator COMP in the previous clock is registered in the register, [0060]; The convolutional neural network may include a plurality of convolution layers and a plurality of pooling layers. The output of each layer can be the input of another layer or a consecutive layer. For example, the output of the Nth convolution layer can be the input of the Nth pooling layer or another consecutive layer, the output of the Nth pooling layer can be the input of the (N+1)th pooling layer or another consecutive layer [0036])
	The same motivation to combine dependent claim 5 applies here.

	Regarding claim 29, Modified Lin teaches the processing apparatus of claim 17, Lin teaches performed during an application of an activation function to the output maps to generate the activation data. (In one configuration, the bias values are modified to specify a zero-mean throughout the network for the distributions of activations. In addition, a non-linear activation function is modified when the mean value is incorporated into the network bias [0073]; The resulting network has zero mean activations: [0074]; For some layers, for example at the output of the ANN, non-zero-mean output activations may be specified 0075])
	Modified Lin does not explicitly teach wherein the comparison of the value stored in the register to the maximum value of the output maps of the current layer 
	Du teaches wherein the comparison of the value stored in the register to the maximum value of the output maps of the current layer (In this embodiment, the sum buffer unit 5 includes a plurality of pooling units 52, and each pooling unit 52 includes a register set REG, a comparator COMP, and an output switch, … The register set REG has four registers, which can output the stored data to the comparator COMP. Three of the registers can receive and store the data read from the convolution operation module 3 or the memory 1, and the residual register can receive the output of the comparator COMP and store the maximum value of the outputs of the comparator COMP. The comparator COMP can compare the three inputted data and the maximum value of the previous comparison so as to output the maximum value. In other words, the maximum value outputted by the comparator COMP in the previous clock is registered in the register, [0060]; The convolutional neural network may include a plurality of convolution layers and a plurality of pooling layers. The output of each layer can be the input of another layer or a consecutive layer. For example, the output of the Nth convolution layer can be the input of the Nth pooling layer or another consecutive layer, the output of the Nth pooling layer can be the input of the (N+1)th pooling layer or another consecutive layer [0036])
	The same motivation to combine dependent claim 17 applies here.

5.	Claims 21-24 is rejected under 35 U.S.C. 103 as being unpatentable over Lin et al. (US20160328646) in view of Lin et al. (Fixed Point Quantization of Deep Convolutional Networks, Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 48, hereinafter “Lin NPL”) and further in view of Du et al. (US20180232629)

	Regarding claim 21, Lin teaches a processing method, comprising: initiating a neural network including a plurality of layers; (The deep convolutional network 350 may include multiple different types of layers based on connectivity and weight sharing …, the deep convolutional network 350 includes multiple convolution blocks ….The convolution layers may include one or more convolutional filters, which may be applied to the input data to generate a feature map [0053])
	generating output maps of a current layer of the neural network (The outputs of the convolutional connections may be considered to form a feature map layer 318 (as current layer) [0051] Fig. 3A) 
	determining a lightweight format (the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082]) 
	determining a lightweight format for the output maps of the current layer based on the selectively set maximum value (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081])
	Lin does not explicitly teach based on the selectively set maximum value and selectively setting a maximum value for the output maps of the current layer to be one of a maximum value of one or more output maps of a previous layer of the neural network and a determined maximum value of the output maps of the current layer; the lightweight format which is not determined before the neural network is initiated;
	Lin NPL teaches determine a lightweight format for the output maps of the current layer, the lightweight format which is not determined before the neural network is initiated; (For a given layer of DCN the goal of conversion is to represent the input activations, the output activations, and the parameters of that layer in fixed point, pg. 2, 3. Floating point to fixed point conversion)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Lin to incorporate the method of Lin NPL for the benefit of offering an analytical solution for bit-width choice per layer to optimize the SQNR for the network (Lin NPL, pg. 2 right col., first para.)
	Du teaches based on the selectively set maximum value and selectively setting a maximum value for the output maps of the current layer to be one of a maximum value of one or more output maps of a previous layer of the neural network and a determined maximum value of the output maps of the current layer; (In this embodiment, the sum buffer unit 5 includes a plurality of pooling units 52, and each pooling unit 52 includes a register set REG, a comparator COMP, and an output switch, … The register set REG has four registers, which can output the stored data to the comparator COMP. Three of the registers can receive and store the data read from the convolution operation module 3 or the memory 1, and the residual register can receive the output of the comparator COMP and store the maximum value of the outputs of the comparator COMP. The comparator COMP can compare the three inputted data and the maximum value of the previous comparison so as to output the maximum value. In other words, the maximum value outputted by the comparator COMP in the previous clock is registered in the register, [0060]; The convolutional neural network may include a plurality of convolution layers and a plurality of pooling layers. The output of each layer can be the input of another layer or a consecutive layer. For example, the output of the Nth convolution layer can be the input of the Nth pooling layer or another consecutive layer, the output of the Nth pooling layer can be the input of the (N+1)th pooling layer or another consecutive layer [0036])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Lin to incorporate the method of Du for the benefit of calculating the convolution multiplication of the next convolution layer (Du, [0037])

	Regarding claim 22, Modified Lin teaches a processing method of claim 21, Lin teaches wherein the initiating of the neural network comprises: inputting input data to the neural network for inference on the input data. (convolution layers may include one or more convolutional filters, which may be applied to the input data to generate a feature map [0053])

	Regarding claim 23, Modified Lin teaches a processing method of claim 21, Lin teaches wherein the determining a lightweight format comprises: determining the lightweight format (the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082]) comprises:  
	based on a distribution of at least a portion of activation data being processed in the neural network (FIG. 5A illustrate distributions of activation values in different layers (being processed) in an exemplary deep convolutional network… FIG. 5A shows the activation values for convolution layers zero to five (conv0. . . conv5) and fully connected layers one and two (fc1, fc2) … Accordingly, aspects of the present disclosure are directed toward modifications of the weight and/or activation value calculations so that the distributions have a zero mean (e.g., an approximately zero mean)[0064])
	Lin NPL teaches for the output maps of the current layer (For a given layer of DCN the goal of conversion is to represent the input activations, the output activations, and the parameters of that layer in fixed point, pg. 2, 3. Floating point to fixed point conversion)  
	The same motivation to combine independent claim 21 applies here.

	Regarding claim 24, Modified Lin teaches a processing method of claim 21, Lin teaches based on at least one lightweight format (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081])
	Lin NPL teaches wherein output maps of a subsequent layer of the neural network have a respective high bit width, (From Equation 9, it is seen that trade-offs can be made between quantizers of different layers to produce the same γoutput. That is to say,  we can choose to use smaller bit-widths for some layers by increasing bit-widths for other layers, pg. 5-6, left col, first para.; since the number of parameters of fc0 is very small in this experiment, we will set the bit-width to a large value to eliminate the impact of quantizing fc0 from the analysis, pg. 7, left col, last para. Table 1) and 
	the processor is further configured to lighten the output maps of the subsequent layer to have respective low bit widths (Some layers may require a large number of computations (multiply-accumulate operations). Reducing the bit-widths for these layers would reduce the overall network computation load., pg. 6, left col, first bullet point; For fully-connected layers we first keep the network as floating point and quantize the weights of fully-connected layers only. We then reduce bit-width of fully-connected layers until the classification accuracy starts to degrade, pg. 7, right col, first para.) 
	Du teaches dependent on a result of the selectively set maximum value. (Three of the registers can receive and store the data read from the convolution operation module 3 or the memory 1, and the residual register can receive the output of the comparator COMP and store the maximum value of the outputs of the comparator COMP. The comparator COMP can compare the three inputted data and the maximum value of the previous comparison so as to output the maximum value. In other words, the maximum value outputted by the comparator COMP in the previous clock is registered in the register, [0060])
	The same motivation to combine independent claim 21 applies here.

6.	Claim 25 is rejected under 35 U.S.C. 103 as being unpatentable over Lin et al. (US20160328646) in view of Lin et al. (Fixed Point Quantization of Deep Convolutional Networks, Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 48, hereinafter “Lin NPL”) in view of Du et al. (US20180232629) and further in view of Brothers et al. (US20160358069) 

	Regarding claim 25, Modified Lin teaches the processing method of claim 21, Brothers teaches wherein the selective setting of the maximum value for the output maps of the current layer comprises: maintaining the maximum value for the output maps of the current layer, (In another example, whenever the input to the activation function is above an upper suppression threshold, activation circuit 130 causes the output of the activation function to be a maximum value (“MAX_VALUE”) such as all ones [0041]) 
	the maximum value of the one or more output maps of a previous layer of the neural network when the maximum value of the output maps of the current layer is determined to not meet the maximum value of the one or more output maps of the previous layer; (NN engine 100 may be configured to bypass convolution operations by detecting high feature strength and clamping (or quantizing) to MAX_VALUE or by quantizing the feature map values to create areas of flatness, … An area of flatness further may refer to an area where the values are not all maximum values and are not all zero values [0045])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Lin to incorporate the method of Brothers for the benefit of performance of the neural network in terms of processing speed can be increased, while power consumption of the neural network is reduced (Brothers, [0029])

7.	Claim 26 is rejected under 35 U.S.C. 103 as being unpatentable over Lin et al. (US20160328646) in view of Brothers et al. (US20160358069)

	Regarding claim 26, Lin teaches a processing method, comprising: performing an operation between input data of a current layer of a neural network and a weight kernel of the current layer (FIGS. 5A and 5B illustrate distributions of activation values and weights in different layers of an exemplary deep convolutional network. FIG. 5A shows the activation values for convolution layers zero to five (conv0, …, conv5) …, Accordingly, aspects of the present disclosure are directed toward modifications of the weight and/or activation value calculations so that the distributions have a zero mean (e.g., an approximately zero mean) [0064])
	to generate first output maps of the current layer (The outputs of the convolutional connections may be considered to form a feature map layer 318 (as current layer) [0051] Fig. 3A) 
	the input data and the weight kernel (convolution layer may include one or more convolutional filters (also known as kernels), which may be applied to the input data to generate a feature map (as output map of current layer) [0053]
	having a low bit width (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] shows scaling data from a high bit width into data with a low bit width); For example, if |μ|<σ, the increase in the specified bit-width is less than 1 extra bit (which implies low bit width) for each quantity being quantized [0068])
	generating second output maps of the current layer, (each element of the feature map (e.g., 320 (as second output map)) receiving input from a range of neurons in the previous layer [0051]) 
	determining a lightweight format (the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082]) 
	of an input map of a subsequent layer of the neural network based on the maximum value, (The outputs of the convolutional connections may be considered to form a feature map in the subsequent layer 318 and 320, with each element of the feature map (e.g., 320) receiving input from a range of neurons in the previous layer (e.g., 318) [0051]), 
	and lightening the input map to have the low bit width based on the lightweight format. (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] shows scaling data from a high bit width into data with a low bit width); For example, if |μ|<σ, the increase in the specified bit-width is less than 1 extra bit (which implies low bit width) for each quantity being quantized [0068]),
	Lin does not explicitly teach to generate first output maps of the current layer having a high bit width, by applying the first output maps to an activation function; outputting a maximum value of the second output maps; and the input map having the high bit width;
	Brothers teaches generate first output maps of the current layer have a high bit width (For example, while a non-zero weight is 8 bits in width (as low bit width), data signal may be 128 bits in width (as high bit width) allowing the weight to be sent to more than one location [0085])
	by applying the first output maps to an activation function; (Activation circuit 130 is configured to receive the summed results from accumulator circuit 125 and apply the activation function to the summed result. [0040])
	outputting a maximum value of the second output maps; (For example, given a 16×16 output feature map, PSS circuit 135 may take the maximum value at each 2×2 portion and sub-sample down to an 8×8 output feature map that may be written to memory 145 as an output feature map 150 [0042] in the context of processing input feature maps of a neural network to generate output feature maps to be consumed by a next layer of the neural network [0032]
	the input map having the high bit width; (For example, while a non-zero weight is 8 bits in width (as low bit width), data signal may be 128 bits in width (as high bit width) allowing the weight to be sent to more than one location [0085])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Lin to incorporate the method of Brothers for the benefit of performance of the neural network in terms of processing speed can be increased, while power consumption of the neural network is reduced (Brothers, [0029])

8.	Claim 30 is rejected under 35 U.S.C. 103 as being unpatentable over Lin et al. (US20160328646) in view of Brothers et al. (US20160358069) and further in view of Du et al. (US20180232629)

	Regarding claim 30, Modified Lin teaches the processing method of claim 26, Lin teaches determining a lightweight format of an input map of a subsequent layer of the neural network (The outputs of the convolutional connections may be considered to form a feature map in the subsequent layer 318 and 320, with each element of the feature map (e.g., 320) receiving input from a range of neurons in the previous layer (e.g., 318) [0051]), 
	Du teaches further comprising: determining whether the maximum value of the second output maps of the current layer meets a maximum value of output maps of a previous layer of the neural network; ((In this embodiment, the sum buffer unit 5 includes a plurality of pooling units 52, and each pooling unit 52 includes a register set REG, a comparator COMP, and an output switch, … The register set REG has four registers, which can output the stored data to the comparator COMP. Three of the registers can receive and store the data read from the convolution operation module 3 or the memory 1, and the residual register can receive the output of the comparator COMP and store the maximum value of the outputs of the comparator COMP. The comparator COMP can compare the three inputted data and the maximum value of the previous comparison so as to output the maximum value. In other words, the maximum value outputted by the comparator COMP in the previous clock is registered in the register, [0060]; The convolutional neural network may include a plurality of convolution layers and a plurality of pooling layers. The output of each layer can be the input of another layer or a consecutive layer. For example, the output of the Nth convolution layer can be the input of the Nth pooling layer or another consecutive layer, the output of the Nth pooling layer can be the input of the (N+1)th pooling layer or another consecutive layer [0036])
	based on a result of the determining of whether the maximum value of the second output maps of the current layer meets the maximum value of the output maps of the previous layer. (Three of the registers can receive and store the data read from the convolution operation module 3 or the memory 1, and the residual register can receive the output of the comparator COMP and store the maximum value of the outputs of the comparator COMP. The comparator COMP can compare the three inputted data and the maximum value of the previous comparison so as to output the maximum value. In other words, the maximum value outputted by the comparator COMP in the previous clock is registered in the register, [0060])
	The same motivation to combine independent claim 26 applies here.

9.	Claim 31 is rejected under 35 U.S.C. 103 as being unpatentable over Lin et al. (US20160328646) in view of Lin et al. (Fixed Point Quantization of Deep Convolutional Networks, Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 48, hereinafter “Lin NPL”)

	Regarding claim 31, Lin teaches a processing method using a neural network, (Deep convolutional Networks (DCNs) can be trained using supervised learning in which both the input and output targets are known for many exemplars and are used to modify the weights of the network by use of gradient descent methods [0049]; using Deep belief networks (DBNs) which are probabilistic models comprising multiple layers of hidden nodes… to extract a hierarchical representation of training data sets [0048]) comprising: 
	generating output maps of a current layer of the neural network (The outputs of the convolutional connections may be considered to form a feature map layer 318 (as current layer) [0051] Fig. 3A) 
	 by performing a convolution operation between input maps of the current layer and weight kernels of the current layer; (FIGS. 5A and 5B illustrate distributions of activation values and weights in different layers of an exemplary deep convolutional network. FIG. 5A shows the activation values for convolution layers zero to five (conv0, …, conv5) …, Accordingly, aspects of the present disclosure are directed toward modifications of the weight and/or activation value calculations so that the distributions have a zero mean (e.g., an approximately zero mean). [0064]);
	based on a distribution of at least a portion of activation data being processed in the neural network; (FIG. 5A illustrate distributions of activation values in different layers (being processed) in an exemplary deep convolutional network… FIG. 5A shows the activation values for convolution layers zero to five (conv0. . . conv5) and fully connected layers one and two (fc1, fc2) … Accordingly, aspects of the present disclosure are directed toward modifications of the weight and/or activation value calculations so that the distributions have a zero mean (e.g., an approximately zero mean). [0064]);
	lightening activation data corresponding to the output maps of the current layer including respectively lightening activation data corresponding to the multiple output maps based on the determined respective lightweight formats. (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] (shows scaling data from a high bit width into data with a low bit width); Application of quantization to the weights, biases, and activation values in artificial neural networks includes the determination of a step size. For example, the step sizes of a symmetric uniform quantizer for Gaussian, Laplacian, and Gamma distributions may be calculated with a deterministic function of the standard deviation of the input distribution, if it is assumed that the distributions have zero mean and unit variance [0064];The outputs of the convolutional connections may be considered to form a feature map in the subsequent layer 318 and 320, with each element of the feature map (e.g., 320) receiving input from a range of neurons in the previous layer (e.g., 318) and from each of the multiple channels.[0051])
	Lin does not explicitly teach determining respective lightweight formats for each of multiple output maps of the output maps of the current layer 
	Lin NPL teaches respective lightweight formats for each of multiple output maps of the output maps of the current layer (For a given layer of DCN the goal of conversion is to represent the input activations, the output activations, and the parameters of that layer in fixed point, pg. 2, 3. Floating point to fixed point conversion)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Lin to incorporate the method of Lin NPL for the benefit of offering an analytical solution for bit-width choice per layer to optimize the SQNR for the network (Lin NPL, pg. 2 right col., first para.)

Conclusion
	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MORIAM MOSUNMOLA GODO whose telephone number is (571)272-8670. The examiner can normally be reached Monday-Friday 7:30am-5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/M.G./Examiner, Art Unit 2121                                         

/DANIEL T PELLETT/Primary Examiner, Art Unit 2121