DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements submitted on May 27, 2020 and September 23, 2021 has been considered by the Examiner and made of record in the application.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 

Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claim 46 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  Claim 46 refers to a “computer-readable medium”.  However, in Applicant’s specification, the computer-readable medium is not clearly defined to exclude non-statutory transitory media such as signals or transmission media.  Therefore, the subject matter claimed in Claim 46 is again deemed non-statutory subjected matter.  Examiner suggests replacing “computer-readable medium” with --a non-transitory computer-readable medium--.  Appropriate correction is required.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-8, 10-30, and 32-46 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Kluska et al. (hereinafter Kluska) (Non-Patent Literature – “Post-training Quantization Methods for Deep Learning Models”).
Regarding claims 1, 23, 45, and 46, Kluska teaches and discloses a method and system (system; section 1), comprising: a processor; and a memory having instructions stored thereon which, when executed by the processor, performs an operation for adaptively executing machine learning models on a computing device (computing device; section 1), the operation comprising: 
receiving weight information for a model to be executed on a computing device (section 3.3; teaches input of convolutional neural network (CNN) information including weight information); 
reducing the received weight information into quantized weight information having a reduced bit size relative to the received weight information (section 3.3: teaches quantize input and output layers weights and activations of CNN); 
performing first inferences using the machine learning model and the received weight information (section 3.3; teaches performing a baseline accuracy which corresponds to the inferences performed with the original received weight information; section 5.2; teaches FP32 baseline; Tables 4 and 5); 
performing second inferences using the machine learning model and the quantized weight information (section 3.3: teaches performing inferences using the quantized weight corresponding to the inferences performed for subsequent layers taught by the algorithm); 
comparing results of the first and second inferences; determining that results of the second inferences are within a threshold performance level of results of the first inferences (section 3.3; teaches comparing the results of the inferences performed and determining the result of the second inference is within a delta or threshold of the first inference as taught by the algorithm); and 
based on determining that results of the second inferences are within a threshold performance level of results of the first inferences, performing one or more subsequent inferences using the machine learning model and the quantized weight information (section 3.3; teaches once a set of weights is selected for a first layer, further inferences are performed for other layers with the calculated weights of the first layer).

Regarding claims 2 and 24, Kluska further teaches wherein: the received weight information comprises a floating point representation of weights in the machine learning model (section 5.2; teaches the baseline points; Tables 4 and 5); the quantized weight information comprises an integer approximation of the floating point representation of the weights in the machine learning model (section 1; teaches iteratively quantizing each layer in the CNN to the smallest possible integer precision that does not introduce accuracy degradation).

Regarding claims 3 and 25, Kluska further teaches wherein the quantized weight information comprises a plurality of weight sets, each weight set having a different bit size (section 3.3; teaches input and output layers of the CNN are quantized and compressing one layer at a time starting from the lowest number of bits from the predefined set).

Regarding claims 4 and 26, Kluska further teaches wherein: performing the second inference comprises performing an inference using the machine learning model and each of the plurality of weight sets (section 3.3; teaches different weights can be selected for different layers), determining that results of the second inference are within the threshold performance level of results of the first inference comprises: identifying a weight set of the plurality of weight sets with a result having a performance closest to the threshold performance level, and returning the result associated with the identified weight set as the result of the second inference (section 3.3: teaches performing inferences using the quantized weight corresponding to the inferences performed for subsequent layers taught by the algorithm).

Regarding claims 5 and 27, Kluska further teaches wherein performing the one or more subsequent inferences using the machine learning model and the quantized weight information comprises performing the one or more subsequent inferences using the identified weight set of the plurality of weight sets (section 3.3: teaches performing inferences using the quantized weight corresponding to the inferences performed for subsequent layers taught by the algorithm).

Regarding claims 6 and 28, Kluska further teaches wherein the quantized weight information comprises first quantized weights having a predefined bit size (section 3.3: teaches n bits as taught by the algorithm).

Regarding claims 7 and 29, Kluska further teaches generating second quantized weights having a smaller bit size than the quantized weight information; performing a subset of the one or more subsequent inferences using the second quantized weights; determining that results of the subset of the one or more subsequent inferences using the second quantized weights are within the threshold performance level of results of the subset of the one or more subsequent inferences using the machine learning model and the quantized weight information; and based on the determining, performing additional inferences beyond the one or more subsequent inferences using the machine learning model and the second quantized weights (section 3.3; teaches once a set of weights is selected for a first layer, further inferences are performed for other layers with the calculated weights of the first layer as taught by the algorithm).

Regarding claims 8 and 30, Kluska further teaches while performing the second inference using the machine learning model and the first quantized weights, generating second quantized weights from quantizing the received weight information, the second quantized weights having a larger bit size than the first quantized weights; determining that results of the second inference using the machine learning model and the quantized weight information are not within the threshold performance level of the results of the first inference for a threshold number of inferences; and performing additional inferences using the second quantized weights (section 3.3; teaches input and output layers of the CNN are quantized and compressing one layer at a time starting from the lowest number of bits from the predefined set).

Regarding claims 10 and 32, Kluska further teaches adjusting the threshold performance level based on an amount of difference between a current input and a previous input for which inferences are to be performed using the machine learning model (section 3.3; teaches comparing the results of the inferences performed and determining the result of the second inference is within a delta or threshold of the first inference as taught by the algorithm).

Regarding claims 11 and 33, Kluska further teaches determining that a difference between a current input and a previous input for which inferences are to be performed using the machine learning model exceeds a threshold amount of change; and performing inferences on the current input and one or more additional inferences using the machine learning model and the received weight information (section 3.3; teaches comparing the results of the inferences performed and determining the result of the second inference is within a delta or threshold of the first inference as taught by the algorithm).

Regarding claims 12 and 34, Kluska further teaches performing inferences for a subset of the one or more subsequent inferences using the machine learning model and the received weight information; determining that results of the subset of the one or more subsequent inferences using the machine learning model and the quantized weight information are outside the threshold performance level relative to results of the subset of the one or more subsequent inferences using the machine learning model and the received weight information; and based on the determining, performing additional inferences using the machine learning model and the received weight information (section 3.3; teaches comparing the results of the inferences performed and determining the result of the second inference is within a delta or threshold of the first inference as taught by the algorithm).

Regarding claims 13 and 35, Kluska further teaches refining the quantized weight information based on results of the one or more subsequent inferences executed using the machine learning model and the quantized weight information, the refined quantized weight information comprising ranges of values to use in performing inferences using the machine learning model and the refined quantized weight information (section 3.3; teaches the selection of a different bit size is considered a refinement of the quantized weight).

Regarding claims 14 and 36, Kluska further teaches wherein each inference of the second inferences is performed according to a periodicity defining a number of first inferences to be performed prior to performing one of the second inferences (section 3.3; teaches comparing the results of the inferences performed and determining the result of the second inference is within a delta or threshold of the first inference as taught by the algorithm).

Regarding claims 15 and 37, Kluska further teaches wherein a subset of the one or more subsequent inferences are also performed using the received weight information (section 3.3; teaches comparing the results of the inferences performed and determining the result of the second inference is within a delta or threshold of the first inference as taught by the algorithm).

Regarding claims 16 and 38, Kluska further teaches wherein: the subset of the one or more subsequent inferences comprises a periodic sampling of the one or more subsequent inferences, and a periodicity of the periodic sampling is determined from a performance difference between results of the subset of the one or more subsequent inferences performed using the received weight information and results of the subset of the one or more subsequent inferences performed using the quantized weight information (section 3.3; teaches comparing the results of the inferences performed and determining the result of the second inference is within a delta or threshold of the first inference as taught by the algorithm).

Regarding claims 17 and 39, Kluska further teaches saving the quantized weight information prior to halting performance of inferences using the machine learning model; and resuming performance of inferences using the machine learning model and the quantized weight information without regenerating the quantized weight information (section 3.3; teaches comparing the results of the inferences performed and determining the result of the second inference is within a delta or threshold of the first inference as taught by the algorithm).

Regarding claims 18 and 40, Kluska further teaches wherein: the quantized weight information comprises individual quantized weights for each layer in the machine learning model, and quantized weights for a respective layer in the machine learning model has a bit size independent of a bit size of quantized weights for other layers in the machine learning model (section 3.3; teaches comparing to naïve quantization can utilize variable bit-width compression across the layers effectively reducing memory usage by the model).

Regarding claims 19 and 41, Kluska further teaches performing inferences using the received weight information on a first set of processing cores in a multicore processor, and performing inferences using the quantized weight information on a second set of cores in the multicore processor (section 1; teaches iteratively quantizing each layer in the CNN to the smallest possible integer precision that does not introduce accuracy degradation on processing units; section 3.3).

Regarding claims 20 and 42, Kluska further teaches wherein the first inferences and second inferences are performed in parallel across different processing cores in a multicore processor (section 1; teaches iteratively quantizing each layer in the CNN to the smallest possible integer precision that does not introduce accuracy degradation on processing units; section 3.3).

Regarding claims 21 and 43, Kluska further teaches wherein: the first inferences are performed on a first type of processor; and the second inferences are performed on a second type of processor (section 1; teaches iteratively quantizing each layer in the CNN to the smallest possible integer precision that does not introduce accuracy degradation on processing units; section 3.3).

Regarding claims 22 and 44, Kluska further teaches wherein: the first inferences are performed on a first set of processing cores in a heterogeneous multicore processor; and the second inferences are performed on a second set of processing cores in the heterogeneous multicore processor (section 1; teaches iteratively quantizing each layer in the CNN to the smallest possible integer precision that does not introduce accuracy degradation on processing units; section 3.3).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 9 and 31 are rejected under 35 U.S.C. 103 as being unpatentable over Kluska et al. (hereinafter Kluska) (Non-Patent Literature – “Post-training Quantization Methods for Deep Learning Models”) in view of KANG et al. (hereinafter Kang) (U.S. Patent Application Publication # 2019/0258932 A1).
Regarding claims 9 and 31, Kluska discloses the claimed invention, but may not expressly disclose wherein the performance level comprises a size of overflow or underflow relative to supported range of values for each layer in the machine learning model, given a bit size of the quantized weight information.
Nonetheless, in the same field of endeavor, Kang further teaches and suggests wherein the performance level comprises a size of overflow or underflow relative to supported range of values for each layer in the machine learning model, given a bit size of the quantized weight information ([0174]; teaches comparing the number of times of overflow in order to decide whether the bit depth accuracy must be changed).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate comparing the number of times of overflow in order to decide whether the bit depth accuracy must be changed as taught by Kang with the system and method as disclosed by Kluska for the purpose of performing deep neural network learning, as suggested by Kang.
	
Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant’s disclosure.
Any response to this Office Action should be faxed to (571) 273-8300 or mailed to:
Commissioner for Patents
P.O. Box 1450
Alexandria, VA 22313-1450
Hand-delivered responses should be brought to 
Customer Service Window
Randolph Building
401 Dulany Street
Alexandria, VA 22314
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SUK JIN KANG whose telephone number is (571)270-1771.  The examiner can normally be reached on Monday-Friday 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Gregory Sefcheck can be reached on (571) 272-3098.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Any inquiry of a general nature or relating to the status of this application or proceeding should be directed to the receptionist/customer service whose telephone number is (571) 272-2600.

/Suk Jin Kang/
Examiner, Art Unit 2477
July 15, 2022