DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

2.	This office Action is in response to an application filed on 11/16/2020, in which claims 1 - 20 are pending and presented for examination.

Information Disclosure Statement
3.	The Examiner has considered the references listed on the Information Disclosure Statements (IDS) submitted on 11/16/2020 and 03/03/2021 based on the provisions of 37 CFR § 1.97.  


Claim Rejections - 35 USC § 103
4.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
56.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.



5.	The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

6.	This application currently names joint inventors. In considering patentability of the claims under pre-AIA  35 U.S.C. 103(a), the examiner presumes that the subject matter of the various claims was commonly owned at the time any inventions covered therein were made absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and invention dates of each claim that was not commonly owned at the time a later invention was made in order for the examiner to consider the applicability of pre-AIA  35 U.S.C. 103(c) and potential pre-AIA  35 U.S.C. 102(e), (f) or (g) prior art under pre-AIA  35 U.S.C. 103(a).


7.	Claims 1 – 4, 7, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Sainath et al. (US 20170076196 A1), hereinafter “Sainath,” in view of Xie et al. (US 20190370658 A1), hereinafter “Xie,” and further view of Rippel et al. (US 20180176578 A1 A1), hereinafter “Rippel.”

	In regard to claim 1, Sainath discloses: a method of quantization, adaptive block partitioning and codebook coding for neural network model compression, (See Sainath, Abstract and Par. 0031 – 0037; See also Fig. 1: Neural Network System 100) the method being performed by at least one processor, (See Sainath, Abstract and Par. 0067: Methods, systems, and apparatus, including computer programs encoded on computer storage media) and the method comprising: 
(See Sainath, Fig. 5: example process for training a recurrent neural network that includes a saturating LSTM layer; Par. 0040, 0041 and 0051: tensors represented by structured matrices; process 300 described as being performed by a saturating LSTM layer implemented by a system of one or more computers; Par. 0064: training recurrent neural network on the training data to determine trained values of the parameters of the recurrent neural network from initial values of the parameters by optimizing, i.e., either maximizing or minimizing, an objective function) 
Sainath is not specific about the feature of the bit depth corresponding to the saturated maximum value, nor about the operations of clipping weight coefficients, quantizing the clipped weight coefficients, and transmitting, to a decoder, a layer header.
However, Xie teaches a method of compressing a pre-trained deep neural network model that suggests: clipping weight coefficients in the multi-dimensional tensor to be within a range of the saturated maximum value; (See Xie, Fig. 2: pruning, quantization, and batch normalization (Step 230); Par. 0020: weight pruned and quantized model; See also Figs. 1 and 2: Step 150: DNN Weight Pruned and Quantized Model; Step 240: Prune/Quantization/Batch Normalization; - The pruning operation suggests a clipping operation, which is adaptable to process weight coefficients in a tensor and within a range of a saturated maximum value)
quantizing the clipped weight coefficients, based on the bit depth (See again Xie, Figs. 1 and 2: Step 150: DNN Weight Pruned and Quantized Model; Step 240: Prune/Quantization/Batch Normalization; - Xie thus suggests a quantizing operation on clipped weight coefficients). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having both the references of Sainath and Xie  before him/her, to modify the method of Sainath, by adding or integrating features of the method of compressing by Xie, in order to achieve a method of compressing a pre-trained deep neural network model that applies: clipping weight coefficients in the multi-dimensional tensor to be within a range of the saturated maximum value; (See Xie, Figs. 1 and 2) quantizing the clipped weight coefficients, based on the bit, as in the instant Application.
Xie is not specific about the feature of transmitting, to a decoder, a layer header comprising the bit depth.
Rippel, however, teaches: transmitting, to a decoder, a layer header comprising the bit depth. (See Rippel, Pars. 0005 and 0006: sender system to encode content for transmission to a receiver system, and the decoder can be deployed by the receiver system to decode the encoded content and reconstruct the original content. The encoder receives content and generates a tensor; See also Pars. 0026 and Pars. 0045, 0077 and 0078; - Par. 0068 teaches layers which are inherently associated to a header containing information such as bit depth)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the references of Sainath, Xie and Rippel before him/her, to modify the method of Sainath, by adding or integrating features of the method of compressing by Xie, and the compression system of Rippel, in order to achieve a method of quantization, adaptive block partitioning and codebook coding for neural network model compression, used to encode DNN models to save both storage and computation
	In regard to claim 2, the claim discloses: the method of claim 1, further comprising coding the bit depth, using a variable length coding or a fixed length coding, (See Xie, Par. 0020: applying certain compression coding such as (but not limited to) run-length coding, Hoffman coding, and both; - i.e., variable length coding or a fixed length coding are not excluded) wherein the layer header comprises the coded bit depth. (See rationale in above rejection of Claim 1, in regard to a layer header comprising bit depth)  	In regard to claim 3, the claim discloses: the method of claim 1, wherein the layer header further comprises the saturated maximum value. (See rationale in above rejection of Claim 1, in regard to a layer header, and Xie’s teachings of a method of compressing a pre-trained deep neural network model that suggests: clipping weight coefficients in the multi-dimensional tensor to be within a range of the saturated maximum value, as in Xie, Figs. 1 and 2: Step 150, and Par. 0020: weight pruned and quantized model) 	In regard to claim 4, the claim discloses: the method of claim 1, wherein the saturated maximum value is represented by a floating number. (See rationale in above rejection of Claim 1, given that value represented by a floating number is common knowledge in the art) 	In regard to claim 7, the claim discloses: the method of claim 1, wherein the layer header further comprises a step size of the quantizing the clipped weight coefficients. (See rationale in above rejection of Claim 1, on the basis of Xie, Par. 0024: adjusting model compression parameters involving reducing pruning/quantization step size; See also  Rippel, Pars. 0005 and 0006, and Pars. 0026 and Pars. 0045, 0077 and 0078; - Par. 0068 teaches layers which are inherently associated to a header containing information such as step size) 	In regard to claim 19, the claim discloses: an apparatus for quantization, adaptive block partitioning and codebook coding for neural network model compression, the apparatus comprising: at least one memory configured to store program code; and at least one processor configured to read the program code and operate as instructed by the program code, the program code comprising: first determining code configured to cause the at least one processor to determine a saturated maximum value of a multi-dimensional tensor in a layer of a neural network, and a bit depth corresponding to the saturated maximum value; clipping code configured to cause the at least one processor to clip weight coefficients in the multi-dimensional tensor to be within a range of the saturated maximum value; quantizing code configured to cause the at least one processor to quantize the clipped weight coefficients, based on the bit depth; and transmitting code configured to cause the at least one processor to transmit, to a decoder, a layer header comprising the bit depth. (Claim 19 discloses limitations that are similar to those of Claim 1, as it represents an apparatus drawn to the method of Claim1. Therefore, the rationale applied above for rejection of Claim 1, also applies, mutatis mutandis, to rejection of Claim 19 on the basis of the same references Sainath, Xie, and Rippel)   	In regard to claim 20, the claim discloses: a non-transitory computer-readable medium storing instructions that, when executed by at least one processor for quantization, adaptive block partitioning and codebook coding for neural network model compression, cause the at least one processor to: determine a saturated maximum value of a multi-dimensional tensor in a layer of a neural network, and a bit depth corresponding to the saturated maximum value; clip weight coefficients in the multi-dimensional tensor to be within a range of the saturated maximum value; quantize the clipped weight coefficients, based on the bit depth; and transmit, to a decoder, a layer header comprising the bit depth. (Claim 20 discloses limitations that are similar to those of Claim 1, as it represents a non-transitory computer-readable medium storing instructions that, when executed perform analogous functions as the method of Claim1. Therefore, the rationale applied above for rejection of Claim 1, also applies, mutatis mutandis, to rejection of Claim 20 on the basis of the same references Sainath, Xie, and Rippel)


Allowable Subject Matter
Claims 5, 6, 8 – 16, 17 and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
     	The following is a statement of reasons for the indication of allowable subject matter:
The prior art of record, which is the most relevant prior art known, does not disclose or render obvious the limitations of the method of those claims:
	
   	Claim 5 discloses the method of claim 4, further comprising determining an integer representing the saturated maximum value, based on an equation: int_layer_sat_maxw=int(ceil(layer_sat_maxw*(2**N))), where int_layer_sat_maxw indicates the integer of the saturated maximum value, and layer_sat_maxw indicates the saturated maximum value. 
The limitations are not found in the cited prior art, nor in the prior art as a whole. Claim 6 depends on Claim 5.
   	Claim 8 discloses the method of claim 1, further comprising: reshaping a four-dimensional (4D) parameter tensor of a neural network, among the quantized weight coefficients, into a three-dimensional (3D) parameter tensor of the neural network, the 3D parameter tensor comprising a convolution kernel size, an input feature size and an output feature size; partitioning the 3D parameter tensor along a plane that is formed by the input feature size and the output feature size, into 3D coding tree units (CTU3Ds); and entropy encoding the CTU3Ds.
	The limitations are not found in the cited prior art, nor in the prior art as a whole. Claims 8 - 16 depend on Claim 8.    	Claim 17 discloses the method of claim 1, further comprising: generating a histogram of the quantized weight coefficients; comparing a rate distortion of each of bins of the generated histogram with a rate distortion of each of entries in a codebook predictor for the quantized weight coefficients; and based on the rate distortion of one of the bins of the histogram being compared to be less than the rate distortion of one of the entries in the codebook predictor, replacing the one of the bins with the one of the entries, to generate a codebook for re-indexing the quantized weight coefficients. 	The limitations are not found in the cited prior art, nor in the prior art as a whole. Claim 18 depends on Claim 8.

References considered but not cited
8.	The prior art made of record and not relied upon is considered pertinent to Applicant’s disclosure.
		Leontaris et al. (US 20200021842 A1) teaches MOTION VECTOR CANDIDATE PRUNING SYSTEMS AND METHODS.
		Chen et al. (US 20190213477 A1) teaches MICRO-PROCESSOR CIRCUIT AND METHOD OF PERFORMING NEURAL NETWORK OPERATION.
		Kim et al. (US 20190095777 A1) teaches METHOD AND APPARATUS FOR QUANTIZING ARTIFICIAL NEURAL NETWORK.
		Aytekin et al. (US 20200311551 A1) teaches US 20200311551 A1.
	

Conclusion
 	Any inquiry concerning this communication or earlier communications from the examiner should be directed to BERTEAU JOISIL whose telephone number is (571)270-7492.  The examiner can normally be reached on 7:30 am to 5:00 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Dave Czekaj can be reached on 571-272-3963.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Berteau Joisil/
Examiner, Art Unit 2487
/Dave Czekaj/Supervisory Patent Examiner, Art Unit 2487