DETAILED ACTION
1.	This office action is in response to the Application No.  filed on 2/14/2018. Claims 1-26 are presented for examination and are currently pending.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
				
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



3.	Claims 1, 3, 4, 6, 7, 10, 13-16, 18, 19 and 21-25 are rejected under 35 U.S.C. 103 as being unpatentable over Lin et al. (US20160328646) in view of Lin et al. (Fixed Point Quantization of Deep Convolutional Networks, Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 48, hereinafter “Lin NPL”)

	Regarding claim 1, Lin teaches a processing method using a neural network (Deep convolutional Networks (DCNs) can be trained using supervised learning in which both the input and output targets are known for many exemplars and are used to modify the weights of the network by use of gradient descent methods [0049]; using Deep belief networks (DBNs) which are probabilistic models comprising multiple layers of hidden nodes… to extract a hierarchical representation of training data sets [0048]), comprising: 
	generating output maps of a current layer of the neural network (The outputs of the convolutional connections may be considered to form a feature map layer 318 (as current layer) [0051] Fig. 3A) 
	by performing a convolution operation between input maps of the current layer and weight kernels of the current layer (FIGS. 5A and 5B illustrate distributions of activation values and weights in different layers of an exemplary deep convolutional network. FIG. 5A shows the activation values for convolution layers zero to five (conv0, …, conv5) ….., Accordingly, aspects of the present disclosure are directed toward modifications of the weight and/or activation value calculations so that the distributions have a zero mean (e.g., an approximately zero mean). [0064]);
	determining a lightweight format (the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082]) 
	based on a distribution of at least a portion of activation data being processed in the neural network (FIG. 5A illustrate distributions of activation values in different layers (being processed) in an exemplary deep convolutional network… FIG. 5A shows the activation values for convolution layers zero to five (conv0. . . conv5) and fully connected layers one and two (fc1, fc2) … Accordingly, aspects of the present disclosure are directed toward modifications of the weight and/or activation value calculations so that the distributions have a zero mean (e.g., an approximately zero mean). [0064]); 
	and lightening activation data corresponding to the output maps of the current layer to have a low bit width (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] (shows scaling data from a high bit width into data with a low bit width); Application of quantization to the weights, biases, and activation values in artificial neural networks includes the determination of a step size. For example, the step sizes of a symmetric uniform quantizer for Gaussian, Laplacian, and Gamma distributions may be calculated with a deterministic function of the standard deviation of the input distribution, if it is assumed that the distributions have zero mean and unit variance [0064]) 
	based on the determined lightweight format (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081]) 
	Lin does not explicitly teach determining a lightweight format for the output maps of the current layer.

	Lin NPL teaches determining a lightweight format for the output maps of the current layer (For a given layer of DCN the goal of conversion is to represent the input activations, the output activations, and the parameters of that layer in fixed point, pg. 2, 3. Floating point to fixed point conversion)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Lin to incorporate the method of Lin NPL for the benefit of offering an analytical solution for bit-width choice per layer to optimize the SQNR for the network (Lin NPL, pg. 2 right col., first para.)

	Regarding claim 3, Lin modified by Lin NPL teaches the processing method of claim 1, Lin teaches wherein the lightening comprises: lightening, to have the low bit width (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] shows scaling data from a high bit width into data with a low bit width); For example, if |μ|<σ, the increase in the specified bit-width is less than 1 extra bit (which implies low bit width) for each quantity being quantized [0068]),
	input maps of a subsequent layer of the neural network corresponding to the output maps of the current layer (The outputs of the convolutional connections may be considered to form a feature map in the subsequent layer 318 and 320, with each element of the feature map (e.g., 320) receiving input from a range of neurons in the previous layer (e.g., 318) [0051]), 
	based on the determined lightweight format (based on modified input distribution 650 having the zero mean (as lightweight format)[0068]
	
	Regarding claim 4, Lin modified by Lin NPL teaches the processing method of claim 1, Lin teaches, wherein the lightening comprises: lightening, to have the low bit width (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] shows scaling data from a high bit width into data with a low bit width); For example, if |μ|<σ, the increase in the specified bit-width is less than 1 extra bit (which implies low bit width) for each quantity being quantized [0068]),
	input maps of a subsequent layer of the neural network corresponding to the output maps of the current layer by performing a shift operation on the input maps of the subsequent layer (When the activations throughout the network are shifted to create a zero-mean distribution for each layer, if a subsequent non-linear function is applied to the activations, the coordinate of the function is shifted by the same amount such that the output is a shift of the original output without a bias modification [0077]) 
	using a value corresponding to the determined lightweight format (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081])

	Regarding claim 6, Lin modified by Lin NPL teaches the processing method of claim 1, Lin teaches wherein determining the lightweight format (Furthermore, the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082])
	predicting a maximum value of the output maps of the current layer based on a maximum value of output maps of a previous layer of the neural network; (For example, the step sizes of a symmetric uniform quantizer for Gaussian, Laplacian, and Gamma distributions may be calculated with a deterministic function of the standard deviation of the input distribution [0064]; an input to the quantizer may be uniformly distributed over [Xmin, Xmax], [0063], with maximum value, Xmax, of 3                        
                            Δ
                            ,
                             
                        
                    Fig. 4)
	and determining the lightweight format for the output maps of the current layer based on the predicted maximum value of the output maps of the current layer (Furthermore, the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082])

	Regarding claim 7, Lin modified by Lin NPL teaches the processing method of claim 1, Lin teaches wherein the lightening comprises: lightening, to have the low bit width Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] shows scaling data from a high bit width into data with a low bit width); For example, if |μ|<σ, the increase in the specified bit-width is less than 1 extra bit (which implies low bit width) for each quantity being quantized [0068]), 
	the output maps of the current layer based on the determined lightweight format (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081])

	Regarding claim 10, Lin modified by Lin NPL teaches the processing method of claim 1, Lin teaches further comprising: obtaining a first weight kernel corresponding to a (The outputs of the convolutional connections may be considered to form a feature map (which implies input convolves with weight kernel to obtain output feature map) in the subsequent layer 318 and 320, with each element of the feature map (e.g., 320) receiving input from a range of neurons in the previous layer (e.g., 318) and from each of the multiple channels [0051])
	by referring to a database ( “determining” may include looking up (e.g., looking up in a table, a database [0087])
	including weight kernels by each layer and output channel (The deep convolutional network 350 may include multiple different types of layers based on connectivity and weight sharing [0053], and receiving input from a range of neurons in the previous layer (e.g., 318) and from each of the multiple channels [0051])
	wherein generating the output maps of the current layer comprises: generating a first output map corresponding to the first output channel (The outputs of the convolutional connections may be considered to form a feature map layer 318 (as current layer) [0051] Fig. 3A) 
	by performing a convolution operation between the input maps of the current layer and the first weight kernel. ((FIGS. 5A and 5B illustrate distributions of activation values and weights in different layers of an exemplary deep convolutional network. FIG. 5A shows the activation values for convolution layers zero to five (conv0, …, conv5) ….., Accordingly, aspects of the present disclosure are directed toward modifications of the weight and/or activation value calculations so that the distributions have a zero mean (e.g., an approximately zero mean).[0064])

	Regarding claim 13, Lin modified by Lin NPL teaches the processing method of claim 1, Lin teaches a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform (a computer program product may comprise a computer-readable medium having instructions stored (and/or encoded) thereon, the instructions being executable by one or more processors to perform the operations described herein [0098])

	Regarding claim 14, Lin teaches a processing apparatus using a neural network, comprising: a processor; and a memory including an instruction readable by the processor, wherein, when the instruction is executed by the processor, the processor is configured to: (a computer program product may comprise a computer-readable medium having instructions stored (and/or encoded) thereon, the instructions being executable by one or more processors to perform the operations described herein [0098])
	generate output maps of a current layer of the neural network (The outputs of the convolutional connections may be considered to form a feature map layer 318 (as current layer) [0051] Fig. 3A) 
	by performing a convolution operation between input maps of the current layer and weight kernels of the current layer ((FIGS. 5A and 5B illustrate distributions of activation values and weights in different layers of an exemplary deep convolutional network. FIG. 5A shows the activation values for convolution layers zero to five (conv0, …, conv5) ….., Accordingly, aspects of the present disclosure are directed toward modifications of the weight and/or activation value calculations so that the distributions have a zero mean (e.g., an approximately zero mean). [0064]);
	determine a lightweight format (the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082]) 
	based on a distribution of at least a portion of activation data being processed in the neural network (FIG. 5A illustrate distributions of activation values in different layers (being processed) in an exemplary deep convolutional network… FIG. 5A shows the activation values for convolution layers zero to five (conv0. . . conv5) and fully connected layers one and two (fc1, fc2) … Accordingly, aspects of the present disclosure are directed toward modifications of the weight and/or activation value calculations so that the distributions have a zero mean (e.g., an approximately zero mean)[0064]);
	and lighten activation data corresponding to the output maps of the current layer to have a low bit width (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] (shows scaling data from a high bit width into data with a low bit width); Application of quantization to the weights, biases, and activation values in artificial neural networks includes the determination of a step size. For example, the step sizes of a symmetric uniform quantizer for Gaussian, Laplacian, and Gamma distributions may be calculated with a deterministic function of the standard deviation of the input distribution, if it is assumed that the distributions have zero mean and unit variance [0064]) 
	based on the determined lightweight format (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081])
	Lin does not explicitly teach determine a lightweight format for the output maps of the current layer.

	Lin NPL teaches determine a lightweight format for the output maps of the current layer (For a given layer of DCN the goal of conversion is to represent the input activations, the output activations, and the parameters of that layer in fixed point, pg. 2, 3. Floating point to fixed point conversion)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Lin to incorporate the method of Lin NPL for the benefit of offering an analytical solution for bit-width choice per layer to optimize the SQNR for the network (Lin NPL, pg. 2 right col., first para.)

	Regarding claim 15, Lin modified by Lin NLP teaches the processing apparatus of claim 14, Lin teaches wherein the processor is configured to: determine the lightweight format (the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082]) 
For example, the step sizes of a symmetric uniform quantizer for Gaussian, Laplacian, and Gamma distributions may be calculated with a deterministic function of the standard deviation of the input distribution [0064]; an input to the quantizer may be uniformly distributed over [Xmin, Xmax], [0063] with maximum value, Xmax, of 3                        
                            Δ
                            ,
                             
                        
                    Fig. 4)

	Regarding claim 16, Lin modified by Lin NPL teaches the processing apparatus of claim 14, Lin teaches wherein the processor is configured to: obtain input maps of a subsequent layer of the neural network based on the output maps of the current layer, ( a neuron in a first layer may communicate its output to every neuron in a second layer, so that each neuron in the second layer will receive input from every neuron in the first layer [0042])
	and lightens input maps of a subsequent layer to have the low bit width (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] shows scaling data from a high bit width into data with a low bit width); For example, if |μ|<σ, the increase in the specified bit-width is less than 1 extra bit (which implies low bit width) for each quantity being quantized [0068]);The outputs of the convolutional connections may be considered to form a feature map in the subsequent layer 318 and 320, with each element of the feature map (e.g., 320) receiving input from a range of neurons in the previous layer (e.g., 318) [0051]), 
	based on the determined lightweight format (based on modified input distribution 650 having the zero mean (as lightweight format) [0068])

	Regarding claim 18, Lin modified by Lin NPL teaches the processing apparatus of claim 14, Lin teaches wherein the processor is configured to: predict a maximum value of the output maps of the current layer based on a maximum value of output maps of a previous layer of the neural network, (For example, the step sizes of a symmetric uniform quantizer for Gaussian, Laplacian, and Gamma distributions may be calculated with a deterministic function of the standard deviation of the input distribution [0064]; an input to the quantizer may be uniformly distributed over [Xmin, Xmax], [0063], with maximum value, Xmax, of 3                        
                            Δ
                            ,
                             
                        
                    Fig. 4)
	and determine the lightweight format for the output maps of the current layer based on the predicted maximum value of the output maps of the current layer (Furthermore, the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082])

	Regarding claim 19, Lin modified by Lin NPL teaches the processing apparatus of claim 14, Lin teaches wherein the processor is configured to: lighten the output maps of the current layer to have the low bit width (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] (shows scaling data from a high bit width into data with a low bit width); Application of quantization to the weights, biases, and activation values in artificial neural networks includes the determination of a step size. For example, the step sizes of a symmetric uniform quantizer for Gaussian, Laplacian, and Gamma distributions may be calculated with a deterministic function of the standard deviation of the input distribution, if it is assumed that the distributions have zero mean and unit variance [0064]) 
	based on the determined lightweight format (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081])

	Regarding claim 21, Lin teaches a processing method, comprising: initiating a neural network including a plurality of layers; (The deep convolutional network 350 may include multiple different types of layers based on connectivity and weight sharing …, the deep convolutional network 350 includes multiple convolution blocks ….The convolution layers may include one or more convolutional filters, which may be applied to the input data to generate a feature map [0053])
	generating output maps of a current layer of the neural network (The outputs of the convolutional connections may be considered to form a feature map layer 318 (as current layer) [0051] Fig. 3A) 
	by performing a convolution operation between input maps of the current layer and weight kernels of the current layer (FIGS. 5A and 5B illustrate distributions of activation values and weights in different layers of an exemplary deep convolutional network. FIG. 5A shows the activation values for convolution layers zero to five (conv0 … conv5) ….., Accordingly, aspects of the present disclosure are directed toward modifications of the weight and/or activation value calculations so that the distributions have a zero mean (e.g., an approximately zero mean). [0064])
	determining a lightweight format (the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082]) 
	and lightening activation data corresponding to the output maps of the current layer (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] (shows scaling data from a high bit width into data with a low bit width); Application of quantization to the weights, biases, and activation values in artificial neural networks includes the determination of a step size. For example, the step sizes of a symmetric uniform quantizer for Gaussian, Laplacian, and Gamma distributions may be calculated with a deterministic function of the standard deviation of the input distribution, if it is assumed that the distributions have zero mean and unit variance [0064]) 
	based on the determined lightweight format (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081])
	Modified Lin does not explicitly teach determine a lightweight format for the output maps of the current layer, the lightweight format which is not determined before the neural network is initiated;
	Lin NPL teaches determine a lightweight format for the output maps of the current layer, the lightweight format which is not determined before the neural network is initiated; (For a given layer of DCN the goal of conversion is to represent the input activations, the output activations, and the parameters of that layer in fixed point, pg. 2, 3. Floating point to fixed point conversion)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Lin to incorporate the method of Lin NPL for the benefit of offering an analytical solution for bit-width choice per layer to optimize the SQNR for the network (Lin NPL, pg. 2 right col., first para.)
	
	Regarding claim 22, Lin modified by Lin NPL teaches a processing method of claim 21, Lin teaches wherein initiating the neural network comprises: inputting input data to the neural network for inference on the input data. (convolution layers may include one or more convolutional filters, which may be applied to the input data to generate a feature map [0053])

	Regarding claim 23, Lin modified by Lin NPL teaches a processing method of claim 21, Lin teaches wherein determining a lightweight format comprises: determining the lightweight format (the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082]) comprises:  
	based on a distribution of at least a portion of activation data being processed in the neural network (FIG. 5A illustrate distributions of activation values in different layers (being processed) in an exemplary deep convolutional network… FIG. 5A shows the activation values for convolution layers zero to five (conv0. . . conv5) and fully connected layers one and two (fc1, fc2) … Accordingly, aspects of the present disclosure are directed toward modifications of the weight and/or activation value calculations so that the distributions have a zero mean (e.g., an approximately zero mean)[0064])
	Lin NPL teaches for the output maps of the current layer (For a given layer of DCN the goal of conversion is to represent the input activations, the output activations, and the parameters of that layer in fixed point, pg. 2, 3. Floating point to fixed point conversion)  
	The same motivation to combine independent claim 21 applies here.
	
	Regarding claim 24, Lin modified by Lin NPL teaches a processing method of claim 21, Lin teaches wherein determining the lightweight format comprises: determining the lightweight format (the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082]) comprises:
	based on a maximum value of the output maps of the current layer (For example, the step sizes of a symmetric uniform quantizer for Gaussian, Laplacian, and Gamma distributions may be calculated with a deterministic function of the standard deviation of the input distribution [0064]; an input to the quantizer may be uniformly distributed over [Xmin, Xmax], [0063], with maximum value, Xmax, of 3                        
                            Δ
                            ,
                             
                        
                    Fig. 4)
	wherein the lightening comprises: lightening, (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] shows scaling data from a high bit width into data with a low bit width); For example, if |μ|<σ, the increase in the specified bit-width is less than 1 extra bit (which implies low bit width) for each quantity being quantized [0068]),
	input maps of a subsequent layer of the neural network corresponding to the output maps of the current layer (The outputs of the convolutional connections may be considered to form a feature map in the subsequent layer 318 and 320, with each element of the feature map (e.g., 320) receiving input from a range of neurons in the previous layer (e.g., 318) [0051]), 
	to have a low bit width (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] shows scaling data from a high bit width into data with a low bit width),
	based on the determined lightweight format (based on modified input distribution 650 having the zero mean (as lightweight format) [0068])
	Lin NPL teaches for the output maps of the current layer (For a given layer of DCN the goal of conversion is to represent the input activations, the output activations, and the parameters of that layer in fixed point, pg. 2, 3. Floating point to fixed point conversion)
	The same motivation to combine independent claim 21 applies here.

	Regarding claim 25, Lin modified by Lin NPL teaches the processing method of claim 21, Lin teaches wherein determining the lightweight format comprises: (the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082]) comprises:
	predicting a maximum value of the output maps of the current layer based on a maximum value of output maps of a previous layer of the neural network; (For example, the step sizes of a symmetric uniform quantizer for Gaussian, Laplacian, and Gamma distributions may be calculated with a deterministic function of the standard deviation of the input distribution [0064]; an input to the quantizer may be uniformly distributed over [Xmin, Xmax], [0063], with maximum value, Xmax, of 3                        
                            Δ
                            ,
                             
                        
                    Fig. 4)
	and determining the lightweight format for the output maps of the current layer based on the predicted maximum value of the output maps of the current layer (Furthermore, the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082])
	wherein the lightening comprises: lightening (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] shows scaling data from a high bit width into data with a low bit width); For example, if |μ|<σ, the increase in the specified bit-width is less than 1 extra bit (which implies low bit width) for each quantity being quantized [0068]), 
	the output maps of the current layer to have a low bit width based on the determined lightweight format (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081])
	
7.	Claims 2, 5, 8, 9, 11, 12, 17, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Lin et al. (US20160328646) in view of Lin et al. (Fixed Point Quantization of Deep Convolutional Networks, Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 48, hereinafter “Lin NPL”) and further in view of Brothers et al. (US20160358069)

	Regarding claim 2, Lin modified by Lin NPL teaches the processing method of claim 1, Lin teaches wherein determining the lightweight format (the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082]) comprises:
	Modified Lin does not explicitly teach based on a maximum value of the output maps of the current layer.
	Brothers teaches based on a maximum value of the output maps of the current layer (For example, given a 16×16 output feature map, PSS circuit 135 may take the maximum value at each 2×2 portion and sub-sample down to an 8×8 output feature map that may be written to memory 145 as an output feature map 150 [0042], in the context of processing input feature maps of a neural network to generate output feature maps to be consumed by a next layer of the neural network [0032])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Lin to incorporate the method of Brothers for the benefit of performance of the neural network in terms of processing speed can be increased, while power consumption of the neural network is reduced (Brothers, [0029])

	Regarding claim 5, Lin modified by Lin NPL teaches the processing method of claim 1, Lin teaches further comprising: loading the output maps of the current layer from a memory (Instructions executed at the general-purpose processor 102 may be loaded from a program memory [0034]); 
	and wherein determining the lightweight format is performed based on a value stored in the register (system parameters associated with a computational device (e.g., neural network with weights), delays, frequency bin information, and task information may be stored in a memory block [0034]). 
	Lin does not explicitly teach updating a register configured to store a maximum value of the output maps of the current layer based on the loaded output maps of the current layer
	Brothers teaches updating a register (the NN engine loads a first 4×4 region of an input feature map from memory into data registers for processing [0112]) 
	configured to store a maximum value of the output maps of the current layer based on the loaded output maps of the current layer (PSS circuit 135 generates output feature maps 150 and stores the resulting output feature maps 150 in memory 145. For example, given a 16×16 output feature map, PSS circuit 135 may take the maximum value at each 2×2 portion and sub-sample down to an 8×8 output feature map that may be written to memory 145 as an output feature map 150 [0042])
	The same motivation to combine dependent claim 2 applies here.

	Regarding claim 8, Lin modified by Lin NPL teaches the processing method of claim 1, Lin teaches wherein the lightening comprises: lightening, to have the low bit width (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] shows scaling data from a high bit width into data with a low bit width); For example, if |μ|<σ, the increase in the specified bit-width is less than 1 extra bit (which implies low bit width) for each quantity being quantized [0068])
	by performing a shift operation on the output maps of the current layer. (When the activations throughout the network are shifted to create a zero-mean distribution for each layer, if a subsequent non-linear function is applied to the activations, the coordinate of the function is shifted by the same amount such that the output is a shift of the original output without a bias modification [0077])	
	using a value corresponding to the determined lightweight format (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081]).
	Modified Lin does not explicitly teach the output maps of the current layer have a high bit width 
	Brothers teaches the output maps of the current layer with a high bit width (For example, while a non-zero weight is 8 bits in width (as low bit width), data signal may be 128 bits in width (as high bit width) allowing the weight to be sent to more than one location [0085]) 
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Lin to incorporate the method of Brothers for the benefit of performance of the neural network in terms of processing speed can be increased, while power consumption of the neural network is reduced (Brothers, [0029])

	Regarding claim 9, Lin modified by Lin NPL teaches the processing method of claim 1, Modified Lin does not explicitly teach further comprising: updating a register configured to store a maximum value of the output maps of the current layer based on the output maps of the current layer generated by the convolution operation, wherein a maximum value of output maps of a subsequent layer of the neural network is predicted based on a value stored in the register.
	Brothers teaches updating a register (the NN engine loads a first 4×4 region of an input feature map from memory into data registers for processing [0112]) 
	configured to store a maximum value of the output maps of the current layer based on the output maps of the current layer generated by the convolution operation (PSS circuit 135 generates output feature maps 150 and stores the resulting output feature maps 150 in memory 145. For example, given a 16×16 output feature map, PSS circuit 135 may take the maximum value at each 2×2 portion and sub-sample down to an 8×8 output feature map that may be written to memory 145 as an output feature map 150 [0042])
(For example, the step sizes of a symmetric uniform quantizer for Gaussian, Laplacian, and Gamma distributions may be calculated with a deterministic function of the standard deviation of the input distribution [0064]; an input to the quantizer may be uniformly distributed over [Xmin, Xmax], [0063], with maximum value, Xmax, of 3                        
                            Δ
                            ,
                             
                        
                    Fig. 4)
	based on a value stored in the register (the NN engine loads a first 4×4 region of an input feature map from memory into data registers for processing [0112])
	The same motivation to combine dependent claim 8 applies here.

	Regarding claim 11, Lin modified by Lin NPL teaches the processing method of claim 10, Lin teaches output channel of the current layer (the input is first decomposed into multiple channels [0051])
	Lin does not explicitly teach the first weight kernel is determined independently from a second weight kernel corresponding to a second output of the current layer.
	Brothers teaches wherein the first weight kernel is determined independently from a second weight kernel corresponding to a second output of the current layer (weight application control circuit may be configured to generate a mask indicating zero (first weight kernel) and non-zero portions (second weight kernel) of at least one of the first region of the input feature map or the first region of the convolution kernel [0178]; NN engine 100 is generally described in the context of processing input feature maps of a neural network to generate output feature maps to be consumed by a next layer of the neural network [0032])
	The same motivation to combine dependent claim 2 applies here.

	Regarding claim 12, Lin modified by Lin NPL teaches the processing method of claim 1, Lin teaches wherein the input maps of the current layer and the weight kernels of the current layer (convolution layer may include one or more convolutional filters (also known as kernels), which may be applied to the input data to generate a feature map (as output map of current layer) [0053] and between each layer of the deep convolutional network 350 are weights (not shown) that are to be updated [0055]) 
	have the low bit width (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] shows scaling data from a high bit width into data with a low bit width),
	Modified Lin does not explicitly teach the output maps of the current layer have a high bit width 
	Brothers teaches the output maps of the current layer have a high bit width (For example, while a non-zero weight is 8 bits in width (as low bit width), data signal may be 128 bits in width (as high bit width) allowing the weight to be sent to more than one location [0085])
	The same motivation to combine dependent claim 2 applies here.
	
	Regarding claim 17, Lin modified by Lin NPL teaches the processing apparatus of claim 14, Lin teaches wherein the processor is configured to: obtain input maps of a subsequent layer of the neural network based on the output maps of the current layer, (The outputs of the convolutional connections may be considered to form a feature map in the subsequent layer 318 and 320, with each element of the feature map (e.g., 320) receiving input from a range of neurons in the previous layer (e.g., 318) [0051]),
	and lighten the input maps of the subsequent layer (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] shows scaling data from a high bit width into data with a low bit width); For example, if |μ|<σ, the increase in the specified bit-width is less than 1 extra bit (which implies low bit width) for each quantity being quantized [0068]),
	 to have the low bit width (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] shows scaling data from a high bit width into data with a low bit width),
	by performing a shift operation on the input maps of the subsequent layer (When the activations throughout the network are shifted to create a zero-mean distribution for each layer, if a subsequent non-linear function is applied to the activations, the coordinate of the function is shifted by the same amount such that the output is a shift of the original output without a bias modification [0077]) 
	using a value corresponding to the determined lightweight format (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081])
	Modified Lin does not explicitly teach with a high bit width
 (For example, while a non-zero weight is 8 bits in width (as low bit width), data signal may be 128 bits in width (as high bit width) allowing the weight to be sent to more than one location [0085])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Lin to incorporate the method of Brothers for the benefit of performance of the neural network in terms of processing speed can be increased, while power consumption of the neural network is reduced (Brothers, [0029])

	Regarding claim 20, Lin modified by Lin NPL teaches the processing apparatus of claim 14, wherein the processor is configured to: lighten, to have the low bit width (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] shows scaling data from a high bit width into data with a low bit width); For example, if |μ|<σ, the increase in the specified bit-width is less than 1 extra bit (which implies low bit width) for each quantity being quantized [0068]) 
	by performing a shift operation on the output maps of the current layer (When the activations throughout the network are shifted to create a zero-mean distribution for each layer, if a subsequent non-linear function is applied to the activations, the coordinate of the function is shifted by the same amount such that the output is a shift of the original output without a bias modification [0077])	
	using a value corresponding to the determined lightweight format (Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9 (as determined light-weight format) [0081]).
	Modified Lin does not explicitly teach the output maps of the current layer have a high bit width 
	Brothers teaches the output maps of the current layer with a high bit width (For example, while a non-zero weight is 8 bits in width (as low bit width), data signal may be 128 bits in width (as high bit width) allowing the weight to be sent to more than one location [0085]) 
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Lin to incorporate the method of Brothers for the benefit of performance of the neural network in terms of processing speed can be increased, while power consumption of the neural network is reduced (Brothers, [0029])

6.	Claim 26 is rejected under 35 U.S.C. 103 as being unpatentable over Lin et al. (US20160328646) in view of Brothers et al. (US20160358069)

	Regarding claim 26, Lin teaches a processing method, comprising: performing an operation between input data of a current layer of a neural network and a weight kernel of the current layer FIGS. 5A and 5B illustrate distributions of activation values and weights in different layers of an exemplary deep convolutional network. FIG. 5A shows the activation values for convolution layers zero to five (conv0, …, conv5) ….., Accordingly, aspects of the present disclosure are directed toward modifications of the weight and/or activation value calculations so that the distributions have a zero mean (e.g., an approximately zero mean) [0064])
	to generate first output maps of the current layer (The outputs of the convolutional connections may be considered to form a feature map layer 318 (as current layer) [0051] Fig. 3A) 
	the input data and the weight kernel (convolution layer may include one or more convolutional filters (also known as kernels), which may be applied to the input data to generate a feature map (as output map of current layer) [0053]
	having a low bit width (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] shows scaling data from a high bit width into data with a low bit width); For example, if |μ|<σ, the increase in the specified bit-width is less than 1 extra bit (which implies low bit width) for each quantity being quantized [0068])
	generating second output maps of the current layer, (each element of the feature map (e.g., 320 (as second output map)) receiving input from a range of neurons in the previous layer [0051]) 
	determining a lightweight format (the step size and Q number representations identified in the floating point to fixed point conversion (reads on Applicant’s meaning of lightening data) may carry over to the fine-tuned network [0082]) 
	of an input map of a subsequent layer of the neural network based on the maximum value, (The outputs of the convolutional connections may be considered to form a feature map in the subsequent layer 318 and 320, with each element of the feature map (e.g., 320) receiving input from a range of neurons in the previous layer (e.g., 318) [0051]), 
	and lightening the input map to have the low bit width based on the lightweight format. (Cscaling(w) is a scaling constant for bit-width w [0078]; Table 1 [0079] shows scaling data from a high bit width into data with a low bit width); For example, if |μ|<σ, the increase in the specified bit-width is less than 1 extra bit (which implies low bit width) for each quantity being quantized [0068]),
	Lin does not explicitly teach to generate first output maps of the current layer having a high bit width, by applying the first output maps to an activation function; outputting a maximum value of the second output maps; and the input map having the high bit width;
	Brothers teaches generate first output maps of the current layer have a high bit width (For example, while a non-zero weight is 8 bits in width (as low bit width), data signal may be 128 bits in width (as high bit width) allowing the weight to be sent to more than one location [0085])
	by applying the first output maps to an activation function; (Activation circuit 130 is configured to receive the summed results from accumulator circuit 125 and apply the activation function to the summed result. [0040])
	outputting a maximum value of the second output maps; (For example, given a 16×16 output feature map, PSS circuit 135 may take the maximum value at each 2×2 portion and sub-sample down to an 8×8 output feature map that may be written to memory 145 as an output feature map 150 [0042] in the context of processing input feature maps of a neural network to generate output feature maps to be consumed by a next layer of the neural network [0032]
(For example, while a non-zero weight is 8 bits in width (as low bit width), data signal may be 128 bits in width (as high bit width) allowing the weight to be sent to more than one location [0085])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Lin to incorporate the method of Brothers for the benefit of performance of the neural network in terms of processing speed can be increased, while power consumption of the neural network is reduced (Brothers, [0029])

Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MORIAM MOSUNMOLA GODO whose telephone number is (571)272-8670. The examiner can normally be reached Monday-Friday 7:30am-5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is 

/M.G./Examiner, Art Unit 2121                                                                                                                                                                                                        



/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121