DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present Application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments, filed on 01/20/2022, with respect to 35 U.S.C. 103 have been fully considered but are moot because the arguments do not apply to any of the citations being used in the current rejection. 

Claim Objections
Claims 14 is objected because of the informalities. The word "update" should be "updating". Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the Application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:



Claim(s) 1-4, 6, 8-10, 12, 14, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Desappan et al. (US 20190012559 A1, hereinafter Desappan) in view of Sinclair (US 5079720 A), further in view of Discenzo et al. (US 7308322 B1, hereinafter Discenzo).

Regarding claim 14, Desappan teaches: A method comprising: 
performing an inference of an activation layer of a neural network (FIG. 1, FIG. 2 and [0026] e.g., “kernel weights and scaling factors for each layer are stored after training and used during inference convolution.” [0027] e.g., “convolution neural network (CNN)” [0038] “The inference can run on an embedded device like a mobile or an automotive to predict objects such as a person in front of a car.” Examiner notes that Application discloses: “activation layers as hidden or intermediate layers.” in [0024]); 
	determining a first limit value and a second limit value of the activation layer based on output values from the inference ([0028] e.g., “The min and max values are computed dynamically during inference. The scale factor, min, and max are different for each layer and each weight… Range (min and max) is computed first on the output (signed 32 bit value)” Examiner note: Application discloses: “the first limit value may correspond to the minimum value and the second limit value may correspond to the maximum value.”);
([0042] e.g., “According to another exemplary embodiment the predicted Min Value is computed based on a formula applied on the previous Min Values and the predicted Max Value is computed based on a formula applied on the previous max Values.” Examiner notes that the first and second limit values as the min and max values are calculated with respective previously stored first and second limit values as previous min and max values.); 
performing layer-level quantization by applying the scaling factor on a subsequent inference of the activation layer ([0041] e.g., “This method of initialization creates a memory burden on the first image only and subsequent quantization may be performed based on the predicted max value and predicted min value.” [0024] e.g., “During training all the values are evaluated to determine the max value and min value and determine a scaling factor for quantization” [0026] e.g., “kernel weights and scaling factors for each layer are stored after training and used during inference convolution.” 
Desappan does not explicitly teach: update, based on the difference, at least one of the first and second limit values in the data storage system.
However, Sinclair teaches: update, based on the difference, at least one of the first and second limit values in the data storage system ([Col. 5 ln. 21-28] e.g., “If the new sample value lies outside the range defined by the maximum and minimum values stored in the registers, the appropriate register, or registers can be updated, in step 34, by replacing the previous value stored in the register with the new sample value. If the new sample value lies within the range stored in the range register, then the new sample value is discarded and the range is not updated.”) 
In view of the teachings of Sinclair it would have been obvious for a person of ordinary skill in the art to apply the teachings of Sinclair to Desappan before the effective filing date of the claimed invention in order to provide fast processing using registers (cf. Sinclair [Col. 7 ln. 26] e.g., “The invention provides for fast, efficient plotting”).
Desappan does not explicitly teach: adjust a [quantization] scaling factor [for the activation layer] based on the first and second limit values.
However, Discenzo teaches: adjusting a [quantization] scaling factor [for the activation layer] based on the first and second limit values ([Col. 25, ln. 13-14] e.g., “maximum and minimum boundaries for calculating scaling factors” [Col. 25, ln. 17-19] e.g., “the maximum and minimum boundaries are expanded to include the present pattern and new scaling factors are obtained.” [Col. 18 ln. 66-67] e.g., “One or more intermediate or hidden layers 220 are provided in the network 80” Examiner notes that a quantization scaling factor is taught by Desappan in [0024].). 
In view of the teachings of Discenzo it would have been obvious for a person of ordinary skill in the art to apply the teachings of Discenzo to Desappan before the effective filing date of the claimed invention in order to improve the quality of common information recognizing that all sensor data may be subject to error and/or noise (cf. 

Regarding claim 17, Desappan in view of Sinclair and Discenzo teaches: The method of claim 14.
Desappan teaches: further comprising determining a minimum value of the activation layer and a maximum value of the activation layer, wherein the first limit value corresponds to the minimum value and the second limit value corresponds to the maximum value ([0028] e.g., “The min and max values are computed dynamically during inference. The scale factor, min, and max are different for each layer and each weight… Range (min and max) is computed first on the output (signed 32 bit value)” Examiner notes that Application discloses: “the first limit value may correspond to the minimum value and the second limit value may correspond to the maximum value.” in [0005])

Regarding claim 1, Desappan teaches: A computing system comprising: a data storage subsystem ([0040] e.g., “The convolution block (604) may also include a processing block (not shown) that computes a min value and a max value of the set of the input values and stores the values in a current min value register and a current max value register respectively.” [0030] “The 32-bit accumulator output (414) is written to an external memory (415) for each layer of the feature map.” Examiner notes that Applicant discloses “the phrase "data storage subsystem"  in [0043]); and 
	a hardware processing unit (Fig. 9 e.g., TDAx processor) programmed to: do claim 14, and is similarly analyzed.

Regarding claim 2, Desappan in view of Sinclair and Discenzo teaches: The computing system of claim 1.
Desappan further teach: wherein the hardware processing unit comprises an accelerator configured to maintain the first and second limit values and the scaling factor in the data storage subsystem (Fig. 9 e.g., TDAx processor; [0040] e.g., “The convolution block (604) may also include a processing block (not shown) that computes a min value and a max value of the set of the input values and stores the values in a current min value register and a current max value register respectively.” [0026] “It should be noted that kernel weights and scaling factors for each layer are stored after training and used during inference convolution.” Examiner notes that Application discloses “the phrase "data storage subsystem" generally refers to any type or combination of one or more data storage units, including registers, caches, memory devices, etc.” in [0043].).

Regarding claim 3, Desappan in view of Sinclair and Discenzo teaches: The computing system of claim 1.
	Desappan further teaches: wherein the accelerator is further configured to associate the scaling factor with the activation layer ([0028] e.g., “The scale factor, min, and max are different for each layer and each weight.” Examiner notes that Application discloses: “activation layers as hidden or intermediate layers.” in [0024]).   

Regarding claim 4, the claim recites the computing system of claim 17, and is similarly analyzed.

Regarding claim 6, Desappan in view of Sinclair and Discenzo teaches: The computing system of claim 1.
	Desappan in view of Sinclair does not explicitly teach: wherein the hardware processing unit is further configured to dynamically update the scaling factor.
	However, Discenzo further teaches: wherein the hardware processing unit is further configured to dynamically update the scaling factor (Col. 25, ln. 19-26, e.g., “Whenever such a change of maxima or minima boundaries occurs, the neural network paradigm that uses the preprocessed data may be updated to account for the new changes in the scaling factors and also the previously accumulated data base may need to be rescaled using the new scaling factors based on how much the new boundaries have expanded.”).
The motivation to combine Desappan with Discenzo is the same rationale as set forth above with respect to claim 14.

Regarding claim 8, Desappan teaches: An accelerator comprising: a first data storage unit; a second data storage unit ([0040] e.g., “The convolution block (604) may also include a processing block (not shown) that computes a min value and a max value of the set of the input values and stores the values in a current min value register and a current max value register respectively.” [0030] “The 32-bit accumulator output (414) is written to an external memory (415) for each layer of the feature map.” Examiner notes that Applicant discloses “the phrase "data storage subsystem" generally refers to any type or combination of one or more data storage units, including registers, caches, memory devices, etc.” in [0043]); and 
a processing unit (Fig. 9 e.g., TDAx processor) 
configured to: do claim 14, and is similarly analyzed.

Regarding claim 9, Desappan in view of Sinclair and Discenzo teaches: The accelerator of claim 8.
Desappan further teaches: further comprising a storage subsystem, wherein: the processing unit is configured to store the scaling factor in the storage subsystem in a manner that associates the scaling factor with the activation layer; and the storage subsystem comprises the first and second data storage units (Fig. 9 e.g., TDAx processor; [0040] e.g., “The convolution block (604) may also include a processing block (not shown) that computes a min value and a max value of the set of the input values and stores the values in a current min value register and a current max value register respectively.” [0026] “It should be noted that kernel weights and scaling factors for each layer are stored after training and used during inference convolution.” Examiner notes that Application discloses “the in [0043].).

Regarding claim 10, the claim recites the accelerator of claim 17, and is similarly analyzed.

Regarding claim 12, the claim recites the accelerator of claim 6, and is similarly analyzed.

Claim(s) 15-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Desappan in view of Sinclair and Discenzo, further in view of Segall (US 20090175338 A1).

Regarding claim 15, Desappan in view of Sinclair and Discenzo teaches: The method of claim 14.
Desappan in view of Sinclair and Discenzo does not explicitly teach: further comprising performing, before or after applying the scaling factor, an offset operation. 
However, Segall teaches: further comprising performing, before or after applying the scaling factor, an offset operation ([0034] e.g., “Scaling and offset operations may be performed”).  
In view of the teachings of Segall it would have been obvious for a person of ordinary skill in the art to apply the teachings of Segall to Desappan before the effective filing date of the claimed invention in order to predict an enhancement layer by scaled 

Regarding claim 16, Desappan in view of Sinclair, Discenzo, and Segall teaches: The method of claim 15.
	Desappan teaches: further comprising associating the scaling factor with the activation layer ([0028] e.g., “The scale factor, min, and max are different for each layer and each weight.”).   

Claim(s) 5, 11 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Desappan in view of Sinclair and Discenzo, further in view of Lin et al. (US 2016/0328647 A1, hereinafter Lin).

Regarding claim 18, Desappan in view of Sinclair and Discenzo teaches: The method of claim 1.
However, Desappan in view of Sinclair and Discenzo does not explicitly teach: wherein applying the scaling factor reduces a bit width needed for at least one arithmetic operation within the neural network.
Lin teaches: wherein applying the scaling factor reduces a bit width needed for at least one arithmetic operation within the neural network ([0031] e.g., “If a particular layer has a high scaling factor (e.g., because there are many neurons), the quantizer may increase the bit width reduction”).  


Regarding claim 5, the claim recites the computing system of claim 18, and is similarly analyzed.

Regarding claim 11, the claim recites the accelerator of claim 18, and is similarly analyzed.

Claim(s) 7 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Desappan in view of Sinclair and Discenzo, further in view of Meeker (US 4751742 A).	
	
Regarding claim 7, Desappan in view of Sinclair and Discenzo teaches: The computing system of claim 6.
	Desappan further teaches: wherein the hardware processing unit is further programmed to update the scaling factor until the first limit value and the second limit value stabilize[, based on at least the difference,] within a predetermined range (Fig. 9 e.g., TDAx processor; Fig. 3 element 300 and [0024] e.g., “However, since the power of two ranges are used, the complete range is not fully utilized. In the table (300) shown, only 160 steps are used out of total possible 256 steps (8 bits), the rest of the numbers are not used and therefore the entire range is not fully utilized which results in an accuracy loss. In the example, a scaling factor of 4 is utilized. In row (305) 0 is represented as 0, row (303) −32 represented as −128, and in row (304) 32 is represented as 127 In the example, power of 2 uses a scale of 4 with row (306) representing the minimum input value −10 that is quantized as −40, row (307) representing the maximum input value 30 that is quantized as 120 and therefore the number of steps is 160 (−40 to 120).” [0045] e.g., “The inference time during the first 2 seconds for the first frame may be high, but the system may stabilize upon updating the predicted min and predicted max value.” Examiner notes that TDAx processor “hardware processing unit” is programmed to update the scaling factor as 4 until the first limit value as -32 and the second limit value as 32 stabilize to -128 and 127, based on at least the difference, respectively within a predetermined range “total possible 256 steps (8 bits))”).
Desappan in view of Sinclair and Discenzo does not explicitly teach: to update values based on at least the difference.  
	However, Meeker teaches: to update values based on at least the difference ([Col. 55 ln. 9-16] e.g., “At Band 1 B-function locations where the magnitude of the difference between the previous value and the new value of the Band 1 B-function exceeds a threshold value, TH, then the following operations additionally occur: (1) The previous values of Band 1 B-function are replaced by the corresponding new values” Examiner notes that the Instant Specification describes “In some examples, processing unit 765 may compare the current minimum and maximum values  in [0052]).
In view of the teachings of Meeker it would have been obvious for a person of ordinary skill in the art to apply the teachings of Meeker to Desappan before the effective filing date of the claimed invention in order to reduce the quantity of data required to be transmitted (cf. Meeker [Col. 52, ln. 5-8] “The quantity of data required to be transmitted per Band 1 B-function can be reduced from about ten bits to four or five bits on the average using variable length coding.”). 

Regarding claim 13, the claim recites the accelerator of claim 7, and is similarly analyzed.

Claim(s) 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Desappan in view of Sinclair and Discenzo, further in view of Crawford et al. (US 8204846 B1, hereinafter Crawford).

Regarding claim 19, Desappan in view of Sinclair and Discenzo teaches: The method of claim 14.
	Desappan in view of Sinclair and Discenzo does not explicitly teach: further comprising periodically updating the scaling factor.  
	However, Crawford teaches: further comprising periodically updating the scaling factor (Col. 7, ln. 12-13, e.g., “the scaling factors are periodically refined or updated”).  
. 

Claim(s) 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Desappan in view of Sinclair, Discenzo and Crawford, further in view of Meeker.	

Regarding claim 20, Desappan, Sinclair, Discenzo, and Crawford teaches: The method of claim 19.
	Desappan further teaches: wherein the hardware processing unit is further programmed to update the scaling factor until the first limit value and the second limit value stabilize[, based on at least the difference,] within a predetermined range (Fig. 9 e.g., TDAx processor; Fig. 3 element 300 and [0024] e.g., “However, since the power of two ranges are used, the complete range is not fully utilized. In the table (300) shown, only 160 steps are used out of total possible 256 steps (8 bits), the rest of the numbers are not used and therefore the entire range is not fully utilized which results in an accuracy loss. In the example, a scaling factor of 4 is utilized. In row (305) 0 is represented as 0, row (303) −32 represented as −128, and in row (304) 32 is represented as 127 In the example, power of 2 uses a scale of 4 with row (306) representing the minimum input value −10 that is quantized as −40, row (307) representing the maximum input value 30 that is quantized as 120 and therefore the number of steps is 160 (−40 to 120).” Examiner notes that TDAx processor “hardware processing unit” is programmed to update the scaling factor as 4 until the first limit value as -32 and the second limit value as 32 stabilize to -128 and 127 respectively within a predetermined range “total possible 256 steps (8 bits))”).
Desappan in view of Sinclair, Discenzo and Crawford does not explicitly teach: to update values based on at least the difference.  
	However, Meeker teaches: to update values based on at least the difference ([Col. 55 ln. 9-16] e.g., “At Band 1 B-function locations where the magnitude of the difference between the previous value and the new value of the Band 1 B-function exceeds a threshold value, TH, then the following operations additionally occur: (1) The previous values of Band 1 B-function are replaced by the corresponding new values” Examiner notes that the Instant Specification describes “In some examples, processing unit 765 may compare the current minimum and maximum values and replace one or both with new respective values if a difference between the respective old and new values is greater than a threshold.” in [0052]).
The motivation to combine Desappan with Meeker is the same rationale as set forth above with respect to claim 7.
 
Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure listed below:
Lin et al. (“Fixed Point Quantization of Deep Convolutional Networks”, ICML'16: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 June 2016, pp. 2849-2858): teaches about fixed point quantization of Deep Convolutional Networks, converting a floating point to fixed point and computing cross-layer bit-width optimization.
Yan et al. (US 2020/0234130 A1): teaches the method of slimming of neural networks by computing scaling factor with channels such that each channel is assigned a scaling factor and by removing channels having low relevance.
Deisher et al. (US 2018/0121796 A1): teaches a flexible neural network accelerator including “quantize weight matrix” where the weights are quantized by converting the weights from floating point to integer.

Conclusion
Applicant’s amendment necessitated the new ground(s) of rejection presented in this Office Action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP   
§ 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
	A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAEYONG J PARK whose telephone number is (571) 272-3898. The examiner can normally be reached on M-F 9:00 a.m. - 6:00 p.m.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this Application or proceeding is assigned is 571-273-8300.
Information regarding the status of an Application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published Applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished Applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO 

/JAEYONG J PARK/Examiner, Art Unit 2129
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129