DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application, filed on 09/27/2019. Claim 1-20 are pending and have been examined. Claims 1, 6, 13 and 19 are independent claim.
The present application claims benefits of provisional application 62/821,437 (filed on 03/20/2019) and 62/830,269 (filed on 04/05/2019).
Specification
The specification is objected to as failing to provide proper antecedent basis for the claimed subject matter.  See 37 CFR 1.75(d)(1) and MPEP § 608.01(o).  Correction of the following is required: 
Claim 6 recites “computer media” but the Specification does not recites “computer media”, therefore the Specification is objected because it lacks antecedent basis for the claimed feature.
Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they do not include the following reference sign(s) mentioned in the description: 
the AI chip, such as 504, 506 in FIG. 5
the first layer in the CNN model corresponds to layer 502 in the AI chip in Fig. 5 
the second layer in the CNN corresponds to layer 504 in the AI chip in Fig. 5 
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference character(s) not mentioned in the description: 
Fig. 8 Reference number: 800
Fig. 9 Reference number: 902
Fig. 10 Reference number: 1020, 1010, 1015
 Corrected drawing sheets in compliance with 37 CFR 1.121(d), or amendment to the specification to add the reference character(s) in the description in compliance with 37 CFR 1.121(b) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Objections
Claims 4-5 are objected to because of the following informalities:
Claim 4 recites “AI” without explaining the abbreviation. “AI” should be “artificial intelligence (AI)” in accordance with Specification page 1. The recommended change should be “AI (artificial intelligence)”.
Claim 5 depend on claim 4 and do not cure the deficiencies of the claim 4 therefore claim 5 is objected to for the same rationales. 
Appropriate correction is required.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
Claim 4: 
	the CNN is configured to be executed to perform an AI task based on image data stored in the memory and at least the first set of weights and the second set of weights by propagating the image data from the first convolution layer to the second convolution layer, and to present output of the AI task on an output device
Claim 5:
the CNN is configured to perform the AI task by: generating feature descriptors of the image data
Claim 7:
programming instructions configured to update the quantized weights of the AI model so that output of the AI model based at least on the updated weights are within a range of ground truth of the training data set
Claim 8:
programming instructions configured to, repeat in one or more iterations, until a stopping criteria is met
Claim 10:
programming instructions configured to use a gradient descent method, wherein a loss function in the gradient descent method is based on a sum of loss values over a plurality of training instances in the training data set, wherein the loss value of each of the plurality of training instances is a difference between the quantized output of the AI model for the training instance and a ground truth of the training instance
Claim 11:
the AI chip is configured to: execute the AI task to generate output of the AI task, wherein the quantized weights of the AI model are uploaded into the AI chip; and present the output of the AI task on an output device
Claim 12: 
programming instructions configured to perform, in one or more iterations until a stopping criteria is met
Claim 19:
the CNN is configured to: perform an artificial intelligence (AI) task based on input data and the plurality of weights in the plurality of convolution layers of the CNN
a training system configured to: train weights of the CNN based at least on a training data set, wherein the trained weights of the CNN are stored in floating point
Claim 20:
the embedded CNN is configured to perform the AI task by: generating feature descriptors of input image data based on the plurality of weights
Upon a review of the Specification, each of the bolded generic placeholder in the claims above is described in Drawings Fig. 1 and the following Specification descriptions: 
Page. 4-5 Para [0022] “Examples of an "Al chip" include hardware- or software-based device that is capable of performing functions of an AI logic circuit. An AI chip can be physical or virtual. For example, a physical Al chip may include an embedded cellular neural network. which may contain weights and/or parameters of a convolution neural network (CNN) model. A virtual Al chip may be software-based. For example, a virtual Al chip may include one or more processor simulators to implement functions of a desired AI logic circuit of a physical AI chip” 
Page 34 Para [0080] “the hardware may not need to include a memory, but instead programming instructions are run on one or more virtual machines or one or more containers on a cloud. For example, the various methods illustrated above may be implemented by a server on a cloud that includes multiple virtual machines, each virtual machine having an operating system, a virtual disk, virtual network and applications, and the programming instructions for implementing various functions in the robotic system may be stored on one or more of those virtual machines on the cloud” 
Page 5 Para [0025] “FIG. 1 illustrates a diagram of an example CNN in an AI chip in accordance with various examples described herein. A CNN 100 may include multiple cascaded convolution layers, such as convolution layers, e.g., 102(1), 102(2), 102(3), . . . 102(M)”
Page 8 Para [0031] “Whereas direct quantization of weights (e.g., from 32 bits to 3 bits) may affect the accuracy of the CNN due to loss of precision, a training system may be configured to re-train the AI model which will be explained further in the present disclosure. In some scenarios where the compression ratio is relative low (e.g., for quantization from 32 bits to 8 bits), for which the loss of performance of the CNN due to quantization is minimal, then re-training of weights may not be needed. This is explained in detail with reference to FIG. 2” (emphasis added).
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 2-20 are rejected under 35 U.S.C 112(b)  or 35 U.S.C 112 (pre-AIA ), second paragraph, as failing to set forth the subject matter which the inventor or a joint inventor, or for application subject to pre-AIA  35 U.S.C 112, the application regards, as the invention. 
Each of the claim limitations in claims 4-5, 7-8, 10-12 and 19-20 as identified in section 5 of this Office Action invokes 35 U.S.C.112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. 
Drawings Fig. 1 and the following Specification descriptions: Page. 4-5 Para [0022] “Examples of an "Al chip" include hardware- or software-based device that is capable of performing functions of an AI logic circuit. An AI chip can be physical or virtual. For example, a physical Al chip may include an embedded cellular neural network. which may contain weights and/or parameters of a convolution neural network (CNN) model. A virtual Al chip may be software-based. For example, a virtual Al chip may include one or more processor simulators to implement functions of a desired AI logic circuit of a physical AI chip” 
Page 34 Para [0080] “the hardware may not need to include a memory, but instead programming instructions are run on one or more virtual machines or one or more containers on a cloud. For example, the various methods illustrated above may be implemented by a server on a cloud that includes multiple virtual machines, each virtual machine having an operating system, a virtual disk, virtual network and applications, and the programming instructions for implementing various functions in the robotic system may be stored on one or more of those virtual machines on the cloud” 
Page 5 Para [0025] “FIG. 1 illustrates a diagram of an example CNN in an AI chip in accordance with various examples described herein. A CNN 100 may include multiple cascaded convolution layers, such as convolution layers, e.g., 102(1), 102(2), 102(3), . . . 102(M)”
Page 8 Para [0031] “Whereas direct quantization of weights (e.g., from 32 bits to 3 bits) may affect the accuracy of the CNN due to loss of precision, a training system may be configured to re-train the AI model which will be explained further in the present disclosure. In some scenarios where the compression ratio is relative low (e.g., for quantization from 32 bits to 8 bits), for which the loss of performance of the CNN due to quantization is minimal, then re-training of weights may not be needed. This is explained in detail with reference to FIG. 2”; however, the Specification does not provide the algorithm that performs each of the limitations for which 35 U.S.C. 112(f) is invoked. 
See MPEP 2181 II(B) (“For a computer-implemented 35 U.S.C. 112(f)  claim limitation, the specification must disclose an algorithm for performing the claimed specific computer function, or else the claim is indefinite under 35 U.S.C. 112(b) (b). See Net MoneyIN, Inc. v. Verisign. Inc., 545 F.3d 1359, 1367 (Fed. Cir. 2008). See also In re Aoyama, 656 F.3d 1293, 1297, 99 USPQ2d 1936, 1939 (Fed. Cir. 2011) ("[W]hen the disclosed structure is a computer programmed to carry out an algorithm, ‘the disclosed structure is not the general purpose computer, but rather that special purpose computer programmed to perform the disclosed algorithm.’")”).
Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. For examination purposes, each of the functions in limitations that invoke 35 U.S.C. 112(f) that do not have sufficient description of corresponding structure in the specification has been interpreted as being implemented by processor.
Therefore, 
the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.
Claim 2 recite the limitation " the CNN" in line 2. There is insufficient antecedent basis for this limitation in the claim. For examination purposes examiner has interpreted to be “the embedded CNN”.  
Claim 3 recites “VGG neural network” in line 1 ; it is unclear what VGG stand for in “VGG neural network” since neither the claim not the Specification provides description of “VGG neural network”. For examination purposes examiner has interpreted “VGG neural network” to be “convolution neural network”.
Claim 4 recite the limitation " the CNN" in line 1. There is insufficient antecedent basis for this limitation in the claim. For examination purposes examiner has interpreted to be “the embedded CNN”.  
Claim 6 recite the limitation “the weights" in line 8. This limitation lacks clarity because it is unclear if this is intended to refer back to “train weights” claim 6 line 5. For examination purposes, “the weights” has been interpreted as “the trained weights”.
Claim 6 recite the limitation “an AI chip” in line 17. This limitation lacks clarity because it is unclear if this is intended to refer back to “an AI chip” claim 6 line 11. For examination purposes, “an AI chip” has been interpreted as “the AI chip”.
Claim 7 recite the limitation "the updated weights" in line 3. There is insufficient antecedent basis for this limitation in the claim. For examination purposes examiner has interpreted to be “the updated the quantized weights”.  
Claim 8 recite the limitation “updating the quantized weights” in line 2. This limitation lacks clarity because it is unclear if this is intended to refer back to “update the quantized weights” claim 7 line 2. For examination purposes, “updating the quantized weights” has been interpreted as “the updating the quantized weights”.
Claim 8 recite the limitation "the quantized output" in line 8. There is insufficient antecedent basis for this limitation in the claim. For examination purposes examiner has interpreted to be “an quantized output”.  
Claim 12 recite the limitation “programming instructions” in line 2. This limitation lacks clarity because it is unclear if this is intended to refer back to “programming instructions” claim 6 line 3. For examination purposes, “programming instructions” has been interpreted as “the programming instructions”.
Claim 13 recite the limitation "the memory" in line 5 . There is insufficient antecedent basis for this limitation in the claim. For examination purposes examiner has interpreted to be “a memory”. 
Claim 14 recite the limitation " the CNN" in line 2. There is insufficient antecedent basis for this limitation in the claim. For examination purposes examiner has interpreted to be “the embedded CNN”.  
claim 15 recites “VGG neural network” in line 1; it is unclear what VGG stand for in “VGG neural network” since neither the claim not the Specification provides description of “VGG neural network”. For examination purposes examiner has interpreted “VGG neural network” to be “convolution neural network”.
Claim 18 recite the limitation " the CNN" in line 2. There is insufficient antecedent basis for this limitation in the claim. For examination purposes examiner has interpreted to be “the embedded CNN”.  
Claim 18 recite the limitation "the quantized weights" in line 5. There is insufficient antecedent basis for this limitation in the claim. For examination purposes examiner has interpreted to be “the quantized the trained weights”.  
Claim 18 recite the limitation "the updated weights" in line 7. There is insufficient antecedent basis for this limitation in the claim. For examination purposes examiner has interpreted to be “the updated the trained weights”.  
Claim 19 recite the limitation " the CNN" in line 2. There is insufficient antecedent basis for this limitation in the claim. For examination purposes examiner has interpreted to be “the embedded CNN”.  
Claim 19 recite the limitation "the quantized weights" in line 13. There is insufficient antecedent basis for this limitation in the claim. For examination purposes examiner has interpreted to be “the quantized the trained weights”.  
Claim 3 depend on claim 2 and do not cure the deficiencies of the claim 3 therefore claim 2 is rejected for the same rationales.
Claim 5 depend on claim 4 and do not cure the deficiencies of the claim 4 therefore claim 5 is rejected for the same rationales.
 Claims 7-12 depend on claim 6 and do not cure the deficiencies of the claim 6 therefore claim 7-12 are rejected for the same rationales.
Claims 14-18 depend on claim 13 and do not cure the deficiencies of the claim 13 therefore claim 14-18 is rejected for the same rationales.
Claim 20 depend on claim 19 and do not cure the deficiencies of the claim 19 therefore claim 20 is rejected for the same rationales.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 13-17 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding Claim 13:
 Claim 13 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 13 is directed to method, which is directed to a process, one of the statutory category. 
Step 2A Prong one Analysis: The claim is directed to method for performing an artificial intelligence (AI) task. The claim, which contains the following limitations: 
Causing….to perform the AI task based on the input data and at least the first set of weights and the second set of weights by propagating the input data
presenting output of the AI task on
as drafted, process is a that, under its broadest reasonable interpretation, covers performance of the limitation in the mind (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for mere instruction to apply language and  the recitation of Insignificant Extra-Solution Activity. The above limitations in the context of this claim encompass Causing….to perform the AI task based on the input data and at least the first set of weights and the second set of weights by propagating the input data (correspond to evaluation and judgment because evaluating layer using judgment of weight), presenting output of the AI task (correspond to evaluation and judgment because evaluating and using judgment provide output).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites one additional elements that are mere instruction to implement and abstract idea on a computer, or merely uses a computer tools to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional elements(s) of “AI semiconductor device”, “embedded convolution neural network (CNN)”, “first convolution layer”, “second convolution layer” and “output device”, as drafted, are amount to mere instruction to implement an abstract idea on computer, or merely uses a computer as a tool to perform an abstract idea. Moreover, the recitation of “providing input data”, “..including a first set of weights stored in the memory”, “…including a second set of weights stored in the memory” and “wherein the first set of weights are stored in the memory in a first bit-width and the second set of weights are stored in the memory in a second bit-width different from the first bit-width” is directed to Mere Data Gathering (See MPEP 2106.05(g)). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.  
Step 2B Analysis:  The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element directed to mere instruction to apply the judicial exception. Mere instruction to apply a judicial exception does not amount to significant more. See MPEP 2106.05(f). Furthermore, the recitation of “…including a first set of weights stored in the memory”, “…including a second set of weights stored in the memory” and “wherein the first set of weights are stored in the memory in a first bit-width and the second set of weights are stored in the memory in a second bit-width different from the first bit-width” is directed to Insignificant Extra-Solution Activity that is well known, routine and conventional because the limitation is directed to storing (See MPEP 2106.05(d)(II), “Storing and retrieving information in memory”) and the recitation of “providing input data” is directed to Insignificant Extra-Solution Activity that is well known, routine and conventional because the limitation is directed to receiving data (See MPEP 2106.05(d)(II), “Receiving or transmitting data over a network, e.g., using the Internet to gather data”).
Regarding Claim 14:
 Claim 14 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 14 is directed to method, which is directed to a process, one of the statutory category. 
Step 2A Prong one Analysis: 
Please see analysis of claim 13.
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites one additional elements that are mere instruction to implement and abstract idea on a computer, or merely uses a computer tools to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional elements(s) of “AI semiconductor device”, “embedded convolution neural network (CNN)”, “first convolution layer”, “second convolution layer” and “output device”, as drafted, are amount to mere instruction to implement an abstract idea on computer, or merely uses a computer as a tool to perform an abstract idea. Moreover, the recitation of “providing input data”, “..including a first set of weights stored in the memory”, “…including a second set of weights stored in the memory”, “and wherein the first bit-width is higher than the second bit-width” and “wherein the first set of weights are stored in the memory in a first bit-width and the second set of weights are stored in the memory in a second bit-width different from the first bit-width” is directed to Mere Data Gathering (See MPEP 2106.05(g)). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.  
Step 2B Analysis:  The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element directed to mere instruction to apply the judicial exception. Mere instruction to apply a judicial exception does not amount to significant more. See MPEP 2106.05(f). Furthermore, the recitation of “…including a first set of weights stored in the memory”, “…including a second set of weights stored in the memory”, “and wherein the first bit-width is higher than the second bit-width” and “wherein the first set of weights are stored in the memory in a first bit-width and the second set of weights are stored in the memory in a second bit-width different from the first bit-width” is directed to Insignificant Extra-Solution Activity that is well known, routine and conventional because the limitation is directed to storing (See MPEP 2106.05(d)(II), “Storing and retrieving information in memory”) and the recitation of “providing input data” is directed to Insignificant Extra-Solution Activity that is well known, routine and conventional because the limitation is directed to receiving data (See MPEP 2106.05(d)(II), “Receiving or transmitting data over a network, e.g., using the Internet to gather data”).
Regarding Claim 15:
 Claim 15 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 15 is directed to method, which is directed to a process, one of the statutory category. 
Step 2A Prong one Analysis:
Please see analysis of claim 13
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites one additional elements that are mere instruction to implement and abstract idea on a computer, or merely uses a computer tools to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional elements(s) of “AI semiconductor device”, “embedded convolution neural network (CNN)”, “first convolution layer”, “second convolution layer”, “output device” and “VGG neural network”, as drafted, are amount to mere instruction to implement an abstract idea on computer, or merely uses a computer as a tool to perform an abstract idea. Moreover, the recitation of “providing input data”, “..including a first set of weights stored in the memory”, “…including a second set of weights stored in the memory” and “wherein the first set of weights are stored in the memory in a first bit-width and the second set of weights are stored in the memory in a second bit-width different from the first bit-width” is directed to Mere Data Gathering (See MPEP 2106.05(g)). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.  
Step 2B Analysis:  The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element directed to mere instruction to apply the judicial exception. Mere instruction to apply a judicial exception does not amount to significant more. See MPEP 2106.05(f). Furthermore, the recitation of “…including a first set of weights stored in the memory”, “…including a second set of weights stored in the memory” and “wherein the first set of weights are stored in the memory in a first bit-width and the second set of weights are stored in the memory in a second bit-width different from the first bit-width” is directed to Insignificant Extra-Solution Activity that is well known, routine and conventional because the limitation is directed to storing (See MPEP 2106.05(d)(II), “Storing and retrieving information in memory”) and the recitation of “providing input data” is directed to Insignificant Extra-Solution Activity that is well known, routine and conventional because the limitation is directed to receiving data (See MPEP 2106.05(d)(II), “Receiving or transmitting data over a network, e.g., using the Internet to gather data”).
Regarding Claim 16:
 Claim 16 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 16 is directed to method, which is directed to a process, one of the statutory category. 
Step 2A Prong one Analysis:
Please see analysis of claim 13
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites one additional elements that are mere instruction to implement and abstract idea on a computer, or merely uses a computer tools to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional elements(s) of “AI semiconductor device”, “embedded convolution neural network (CNN)”, “first convolution layer”, “second convolution layer”, “output device” and “VGG neural network”, as drafted, are amount to mere instruction to implement an abstract idea on computer, or merely uses a computer as a tool to perform an abstract idea. Moreover, the recitation of “providing input data”, “..including a first set of weights stored in the memory”, “…including a second set of weights stored in the memory” and “wherein the first set of weights are stored in the memory in a first bit-width and the second set of weights are stored in the memory in a second bit-width different from the first bit-width” is directed to Mere Data Gathering (See MPEP 2106.05(g)). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.  
Step 2B Analysis:  The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element directed to mere instruction to apply the judicial exception. Mere instruction to apply a judicial exception does not amount to significant more. See MPEP 2106.05(f). Furthermore, the recitation of “…including a first set of weights stored in the memory”, “…including a second set of weights stored in the memory” and “wherein the first set of weights are stored in the memory in a first bit-width and the second set of weights are stored in the memory in a second bit-width different from the first bit-width” is directed to Insignificant Extra-Solution Activity that is well known, routine and conventional because the limitation is directed to storing (See MPEP 2106.05(d)(II), “Storing and retrieving information in memory”) and the recitation of “providing input data” is directed to Insignificant Extra-Solution Activity that is well known, routine and conventional because the limitation is directed to receiving data (See MPEP 2106.05(d)(II), “Receiving or transmitting data over a network, e.g., using the Internet to gather data”).
Regarding Claim 17:
 Claim 17 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 17 is directed to method, which is directed to a process, one of the statutory category. 
Step 2A Prong one Analysis: The claim is directed to method for performing an artificial intelligence (AI) task. The claim, which contains the following limitations: 
wherein performing the AI task comprises
generating feature descriptors of the image data
comparing the feature descriptors of the image data with reference feature descriptors
generating the output of the AI task based on the comparing
as drafted, process is a that, under its broadest reasonable interpretation, covers performance of the limitation in the mind (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components and  the recitation of Insignificant Extra-Solution Activity. The above limitations in the context of this claim encompass wherein performing the AI task comprises (correspond to evaluation and judgment), generating feature descriptors of the image data (correspond to evaluation because evaluating image data), comparing the feature descriptors of the image data with reference feature descriptors ( correspond to evaluation and judgment because evaluating feature descriptor using judgment), generating the output of the AI task based on the comparing (correspond to evaluation and judgment because evaluating output and using judgment on comparing).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites one additional elements that are mere instruction to implement and abstract idea on a computer, or merely uses a computer tools to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional elements(s) of “AI semiconductor device”, “embedded convolution neural network (CNN)”, “first convolution layer”, “second convolution layer” and “output device”, as drafted, are amount to mere instruction to implement an abstract idea on computer, or merely uses a computer as a tool to perform an abstract idea. Moreover, the recitation of “providing input data”, “..including a first set of weights stored in the memory”, “…including a second set of weights stored in the memory”, “wherein the first set of weights are stored in the memory in a first bit-width and the second set of weights are stored in the memory in a second bit-width different from the first bit-width” and “wherein the input data is image data captured from an image capturing device” is directed to Mere Data Gathering (See MPEP 2106.05(g)). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.  
Step 2B Analysis:  The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element directed to mere instruction to apply the judicial exception. Mere instruction to apply a judicial exception does not amount to significant more. See MPEP 2106.05(f). Furthermore, the recitation of “…including a first set of weights stored in the memory”, “…including a second set of weights stored in the memory” and “wherein the first set of weights are stored in the memory in a first bit-width and the second set of weights are stored in the memory in a second bit-width different from the first bit-width” is directed to Insignificant Extra-Solution Activity that is well known, routine and conventional because the limitation is directed to storing (See MPEP 2106.05(d)(II), “Storing and retrieving information in memory”) and the recitation of “providing input data” and “wherein the input data is image data captured from an image capturing device” is directed to Insignificant Extra-Solution Activity that is well known, routine and conventional because the limitation is directed to receiving data (See MPEP 2106.05(d)(II), “Receiving or transmitting data over a network, e.g., using the Internet to gather data”).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4 and 13-16 are rejected under 35 U.S.C. 103 as being unpatentable over Ovtcharov et al. (US 20200302269 A1) in view of Alippi et al. (“Moving Convolutional Neural Networks to Embedded Systems: The AlexNet and VGG-16 Case”).
Regarding claim 1. 
Ovtcharov et al. teaches A semiconductor comprising (Page 6 Para [0069] “the computer readable media is implemented as semiconductor-based memory” teaches semiconductor):
a memory (Page 6 Para [0069] “the computer readable media is implemented as semiconductor-based memory” teaches memory); 
and an embedded convolution neural network (CNN) comprising (Page 5 Para [0062] “computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, computing or processing systems embedded in devices (such as wearable computing devices, automobiles, home automation etc.)” and Page 2 Para [0087] “when executed by the one or more processors, will cause the computing device to: during a forward training pass of an artificial neural network (ANN)” and  Page 2 Para [0013] “It should be noted that applications of the disclosed herein can be used with various types of neural networks, such as convolutional neural networks (“CNNs”)” teaches computing device is emended wherein computing device implement CNN).
wherein the first set of weights are stored in the memory in a first bit-width and the second set of weights are stored in the memory in a second bit-width different from the first bit-width (Page 7 Para [0079] “executing a quantizing function to quantize weights for a layer of the ANN using a first bit width, and executing the quantizing function to quantize activation values input to the layer of the ANN using a second bit width” and Page 6 Para [0069] “if the computer readable media is implemented as semiconductor-based memory” teaches weights for layer store in first bit and second bit wherein weights are store in memory).
Ovtcharov et al. does not teach a first convolution layer including a first set of weights stored in the memory; and a second convolution layer including a second set of weights stored in the memory.
However, Alippi et al. teaches a first convolution layer including a first set of weights stored in the memory (Page 216-217 Section 4.1 Computational Complexity and Memory Occupation “the memory occupation of Φ and Φ~ ….where NΦ and NΦ~ are the number of weights to be stored in Φ and Φ~ respectively, and b and b~ are the memory word lengths in terms of number of bits required to store each weight in Φ and Φ….. Nc  is the number of convolutional layers in Φ” teaches convolution layer include weight wherein weight stored in memory);
and a second convolution layer including a second set of weights stored in the memory (Page 2016-217 Section 4.1 Computational Complexity and Memory Occupation “the memory occupation of Φ and Φ~ ….where NΦ and NΦ~ are the number of weights to be stored in Φ and Φ~ respectively, and b and b~ are the memory word lengths in terms of number of bits required to store each weight in Φ and Φ….The convolution computational load CΦconv of Φ is defined as CΦconv=∑i=1Ncni−1⋅s2i⋅ni⋅m2i,(8…..) where Nc  is the number of convolutional layers in Φ” teaches convolution layers which include weights and wherein weight stored in memory).
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by Alippi et al. to the disclosed invention of Ovtcharov et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “A further level of approximation for Φ~ is introduced in Section 4.3 to consider, in the k-th layer, only the convolutional filters useful for the application-specific classification problem; this allows to further reduce computational load and memory occupation of Φ~” (Alippi, Page 216 Section 4 Designing Approximated Convolutional Neural Networks).

Regarding claim 2. 
Ovtcharov et al. in view of Alippi et al. teaches The semiconductor of claim 1, 
Ovtcharov et al. further teaches wherein the second convolution layer is succeeding the first convolution layer in the CNN , and wherein the first bit-width is higher than the second bit-width (Page 4 Para [0045] “the quantizing function 306 applies a floor function to round the new first bit width 308AA and the new second bit width 308BB down to an integer value and/or applies a weight decay to the new first bit width 308AA and the new second bit width 308BB……weight decay is applied to the bit width parameters to guide the search process into regions of lower bit width” and Page 2 Para [0013] “various types of neural networks, such as convolutional neural networks (“CNNs”)… or other suitable ANNs that can be adapted to use the technologies disclosed herein” teaches training improve by normal precious (correspond to first bit- width) to lower (correspond to higher first bit width) for convolution layers).
Regarding claim 3. 
Ovtcharov et al. in view of Alippi et al. teaches The semiconductor of claim 2, 
Alippi et al. further teaches wherein the CNN is a VGG neural network, and wherein the first convolution layer comprises a first plurality of convolution layers in the VGG neural network (Page 218 Section 4.2 The Proposed Methodology “they can be easily imported by pre-defined C libraries storing ready-to-use CNNs (e.g., VGG-16 and AlexNet, etc.)” and Figure 1 and Figure 4 teaches CNNs is VGG), and wherein the second convolution layer comprises a second plurality of convolution layers in the VGG neural network (Page 218 Section 4.2 The Proposed Methodology “they can be easily imported by pre-defined C libraries storing ready-to-use CNNs (e.g., VGG-16 and AlexNet, etc.)” teaches CNNs is VGG).
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by Alippi et al. to the disclosed invention of Ovtcharov et al. 
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “A further level of approximation for Φ~ is introduced in Section 4.3 to consider, in the k-th layer, only the convolutional filters useful for the application-specific classification problem; this allows to further reduce computational load and memory occupation of Φ~” (Alippi, Page 216 Section 4 Designing Approximated Convolutional Neural Networks).
Regarding claim 4. 
Ovtcharov et al. in view of Alippi et al. teaches The semiconductor of claim 1, 
Ovtcharov et al. further teaches wherein the CNN is configured to be executed to perform an AI task based on image data stored in the memory (Page 5 Para [0063] “a system memory 604… The mass storage device 612 can also be configured to store other types of programs and data” and Page 2-3 Para [0028] “Training 102 of ANNs typically utilizes a training data set 108. The training data set 108 includes samples (e.g. images) for applying to an ANN” and Page 2 Para [0013] “convolutional neural networks (“CNNs”)… other suitable ANNs that can be adapted to use the technologies” teaches ANN is executed to perform training (correspond to task) based on the images sample (correspond to data) and image wherein sample area store in memory and ANN comprising CNN) and at least the first set of weights and the second set of weights by propagating the image data from the first convolution layer to the second convolution layer (Page 4-5 Para [0051] “FIG. 3. As shown in FIG. 4, the computed value 404 for the loss function 304 can be used during a backward training pass (i.e. backpropagation) to compute a gradient 402A for the bit width 308A used by the quantizing function 306 when quantizing weights 110” teaches propagation on weight using data for layer), and to present output of the AI task on an output device (Page 2-3 Para [0028] “data describing a desired output from the ANN for each respective sample in the training data set 108 (e.g. a set of images that have been labeled with data describing the actual content in the images)” teaches receive output of the image).
Regarding claim 13. 
Ovtcharov et al. teaches A method for performing an artificial intelligence (AI) task, the method comprising (Page 2 Para [0025] “FIGS. 1-2B. As described briefly above, ANNs are applied to a number of applications in AI and ML including….classification and AI tasks” teaches performing AI task):
providing input data to an AI semiconductor device including an embedded convolution neural network (CNN) comprising at least (Page 5 Para [0062] “computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, computing or processing systems embedded in devices (such as wearable computing devices, automobiles, home automation etc.)” and Page 2 Para [0087] “when executed by the one or more processors, will cause the computing device to: during a forward training pass of an artificial neural network (ANN)” and  Page 2 Para [0013] “It should be noted that applications of the disclosed herein can be used with various types of neural networks, such as convolutional neural networks (“CNNs”)” and Page 6 Para [0069] “if the computer readable media is implemented as semiconductor-based memory” and Page 3 Para [0032] “The ANN then uses the weights 110 (and biases) obtained during training 102 to perform classification, recognition, or other types of tasks on samples in an input data set 114” teaches computing device is emended wherein computing device implement CNN to Semiconductor device):
causing the AI semiconductor device to perform the AI task based on the input data and at least the first set of weights and the second set of weights by propagating the input data from the first convolution layer to the second convolution layer (Page 2 Para [0025] “ANNs are applied to a number of applications in AI and ML including… AI tasks” and Page 4-5 Para [0051] “FIG. 3. As shown in FIG. 4, the computed value 404 for the loss function 304 can be used during a backward training pass (i.e. backpropagation) to compute a gradient 402A for the bit width 308A used by the quantizing function 306 when quantizing weights 110” teaches perform AI task on data and propagation on weight using data for layer); 
and presenting output of the AI task on an output device (Page 2 Para [0025] “ANNs are applied to a number of applications in AI and ML including… AI tasks” and Page 2-3 Para [0028] “data describing a desired output from the ANN for each respective sample in the training data set 108 (e.g. a set of images that have been labeled with data describing the actual content in the images)” teaches receive output of the task);
wherein the first set of weights are stored in the memory in a first bit-width and the second set of weights are stored in the memory in a second bit-width different from the first bit-width (Page 7 Para [0079] “executing a quantizing function to quantize weights for a layer of the ANN using a first bit width, and executing the quantizing function to quantize activation values input to the layer of the ANN using a second bit width” and Page 6 Para [0069] “if the computer readable media is implemented as semiconductor-based memory” teaches weights for layer store in first bit and second bit wherein weights are store in memory).
Ovtcharov et al. does not teaches a first convolution layer including a first set of weights stored in the memory and a second convolution layer including a second set of weights stored in the memory. 
However, Alippi et al. teaches a first convolution layer including a first set of weights stored in the memory (Page 216-217 Section 4.1 Computational Complexity and Memory Occupation “the memory occupation of Φ and Φ~ ….where NΦ and NΦ~ are the number of weights to be stored in Φ and Φ~ respectively, and b and b~ are the memory word lengths in terms of number of bits required to store each weight in Φ and Φ….. Nc  is the number of convolutional layers in Φ” teaches convolution layer include weight wherein weight stored in memory); 
and a second convolution layer including a second set of weights stored in the memory (Page 2016-217 Section 4.1 Computational Complexity and Memory Occupation “the memory occupation of Φ and Φ~ ….where NΦ and NΦ~ are the number of weights to be stored in Φ and Φ~ respectively, and b and b~ are the memory word lengths in terms of number of bits required to store each weight in Φ and Φ….The convolution computational load CΦconv of Φ is defined as CΦconv=∑i=1Ncni−1⋅s2i⋅ni⋅m2i,(8…..) where Nc  is the number of convolutional layers in Φ” teaches convolution layers which include weights and wherein weight stored in memory);
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by Alippi et al. to the disclosed invention of Ovtcharov et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “A further level of approximation for Φ~ is introduced in Section 4.3 to consider, in the k-th layer, only the convolutional filters useful for the application-specific classification problem; this allows to further reduce computational load and memory occupation of Φ~” (Alippi, Page 216 Section 4 Designing Approximated Convolutional Neural Networks).
Regarding claim 14. 
Ovtcharov et al. in view of Alippi et al. teaches The method of claim 13, 
Ovtcharov et al. further teaches wherein the second convolution layer is succeeding the first convolution layer in the CNN, and wherein the first bit-width is higher than the second bit-width (Page 4 Para [0045] “the quantizing function 306 applies a floor function to round the new first bit width 308AA and the new second bit width 308BB down to an integer value and/or applies a weight decay to the new first bit width 308AA and the new second bit width 308BB……weight decay is applied to the bit width parameters to guide the search process into regions of lower bit width” and Page 2 Para [0013] “various types of neural networks, such as convolutional neural networks (“CNNs”)… or other suitable ANNs that can be adapted to use the technologies disclosed herein” teaches training improve by normal precious (correspond to first bit- width) to lower (correspond to higher first bit width) for convolution layers).
Regarding claim 15. 
Ovtcharov et al. in view of Alippi et al. teaches The method of claim 13, 
Alippi et al. further teaches wherein the CNN is a VGG neural network (Page 218 Section 4.2 The Proposed Methodology “they can be easily imported by pre-defined C libraries storing ready-to-use CNNs (e.g., VGG-16 and AlexNet, etc.)” and Figure 1 and Figure 4 teaches CNNs is VGG), and wherein the second convolution layer comprises a second plurality of convolution layers in the VGG neural network (Page 218 Section 4.2 The Proposed Methodology “they can be easily imported by pre-defined C libraries storing ready-to-use CNNs (e.g., VGG-16 and AlexNet, etc.)” teaches CNNs is VGG).
The same motivation to combine as independent claim 13 applies here.
Regarding claim 16. 
Ovtcharov et al. in view of Alippi et al. teaches The method of claim 15, 
Alippi et al. further teaches wherein the first convolution layer comprises a first plurality of convolution layers in the VGG neural network, and wherein the second convolution layer comprises a second plurality of convolution layers in the VGG neural network (Page 218 Section 4.2 The Proposed Methodology “they can be easily imported by pre-defined C libraries storing ready-to-use CNNs (e.g., VGG-16 and AlexNet, etc.)” and Figure 1 and Figure 4 teaches CNNs is VGG), and wherein the second convolution layer comprises a second plurality of convolution layers in the VGG neural network (Page 218 Section 4.2 The Proposed Methodology “they can be easily imported by pre-defined C libraries storing ready-to-use CNNs (e.g., VGG-16 and AlexNet, etc.)” teaches CNNs is VGG).
The same motivation to combine as dependent claim 15 applies here.
Claims 6-12 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Ovtcharov et al. (US 20200302269 A1) in view of Alippi et al.  (“Moving Convolutional Neural Networks to Embedded Systems: The AlexNet and VGG-16 Case”) further in view of Tung et al.  (“CLIP-Q: Deep Network Compression Learning by In-parallel Pruning-Quantization”). 
Regarding claim 6. 
Ovtcharov et al. teaches A system comprising: a processor; and non-transitory computer media containing programming instructions that, when executed, cause the processor to (Page 7 Para [0087] “A computing device, comprising: one or more processors; and at least one computer storage media having computer-executable instructions stored thereupon which, when executed by the one or more processors, will cause the computing device to” teaches processor and computer storage media contain instruction):
train weights of an artificial intelligence (AI) model based at least on a training data set, wherein the trained weights of the AI model are stored in floating point (Page 3 Para [0038] “Performance, energy usage, and storage requirements of ANNs can be improved through the use of quantized-precision floating-point formats during training and/or inference. In particular, weights 110 and activation values 208 (shown in FIGS. 2A and 2B, respectively) can be represented in a lower-precision quantized-precision floating-point format, which typically results in some amount of error being introduced” and Page 3 Para [0032] “The ANN then uses the weights 110 (and biases) obtained during training 102 to perform classification, recognition, or other types of tasks on samples in an input data set 114, typically samples that were not used during training” and Page 4 Para [0048] “which shows a forward training pass for a portion of an example ANN that includes a convolution layer 302” and Fig. 1  teaches training weight of ANN from the training data set and training weight of the ANN are representation of floating point wherein representation of floating point are store, ANN comprising convolution neural network),
and wherein the AI model comprises at least a first convolution layer and a second convolution layer (Page 4 Para [0044] “The values for one or more of the ANN layers can be expressed in a quantized format that has lower precision than normal-precision floating-point formats” and Page 2 Para [0013] “as convolutional neural networks…other suitable ANNs that can be adapted” teaches model comprising convolutional layers);
and include at least a first set of weights for the first convolution layer and a second set of weights for the second convolution layer (Page 7 Para [0087] “compute a new first bit width for quantizing the weights for the layer of the ANN based on the first gradient, and compute a new second bit width for quantizing the activation values input to the layer of the ANN based on the second gradient; and quantize weights and activation values for the ANN at inference time using the new first bit width and the new second bit width” and Page 2 Para [0013] “It should be noted that applications of the disclosed herein can be used with various types of neural networks, such as convolutional neural networks (“CNNs”)… or other suitable ANNs that can be adapted to use the technologies disclosed herein” teaches weights for convolution layers)
and upload the quantized weights to an AI chip capable of executing an AI task (Page 2 Para [0027] “An ANN generally consists of a sequence of layers of different types (e.g. convolution, ReLU, fully connected, and pooling layers)” and Page 3 Para [0031] “the ANN, each layer of the ANN computes the error for the previous layer and the gradients, or updates, to the weights 110 of the layer that move the ANN's prediction toward the desired output” and Page 5 Para [0060] “At operation 514, the ANN training module 106 determines whether training is complete….. repeated until optimal bit widths for quantizing weights 110 and activation values 208 are learned” and Page 5 Para [0063] “FIG. 6 includes one or more central processing units 602 (“CPU”)” teaches quantization weights upload to the ANN for the output (correspond to task)) wherein ANN implemented using computer (correspond to AI Chip)).
Ovtcharov et al. does not teaches wherein the quantized weights are stored in fixed point.
However, Alippi et al. teaches wherein the quantized weights are stored in fixed point (Page 215 Section 3 3.0.2 Precision Scaling “precision scaling aims at reducing the memory occupation associated with the weights of Φ by considering approximated versions θ1~ of θ1….Precision scaling, which is computed through the rounding of weights θis, aims at employing a fixed-point representations of the weights so as to consider 16-bit or 8-bit data types (e.g., int or short int)” and Page 2016 Section 4.1 Computational Complexity and Memory Occupation “The memory reduction when considering Φ instead of Φ depends explicitly on k and implicitly on q and can be quantified as ΔM=MΦ−MΦ¯¯¯¯=NΦ~(b−b~)+(NΦ−NΦ~)b.(6)”  teach quantized weight are represented as fixed point and weight are stored);
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by Alippi et al. to the disclosed invention of Ovtcharov et al. 
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “A further level of approximation for Φ~ is introduced in Section 4.3 to consider, in the k-th layer, only the convolutional filters useful for the application-specific classification problem; this allows to further reduce computational load and memory occupation of Φ~” (Alippi, Page 216 Section 4 Designing Approximated Convolutional Neural Networks).
Ovtcharov et al. in view of Alippi et al. quantize the weights of the AI model to a respective number of quantization levels corresponding to a maximum value of a respective convolution layer of an AI chip……. and wherein a number of quantization levels for the first set of weights is different from a number of quantization levels for the second set of weights; 
However, Tung et al. teaches quantize the weights of the AI model to a respective number of quantization levels corresponding to a maximum value of a respective convolution layer of an AI chip (Page 7875 Section 3.1. In-parallel pruning-quantization “We place two “clips”, scalars c− and c+, such that (p × 100)% of the positive weights in the layer are less than or equal to c+” and Page 7876 Section 3.1. In-parallel pruning-quantization “We then quantize the weights by setting them to the new quantization levels in the next forward pass” and Page 7873 Section 1 Introduction “many practical applications of computer vision require efficient solutions with low memory and energy footprint……Our focus in this paper is on deep network compression, which has the goal of making deep networks more compact” and figure 1 teaches deep neural network of quantized weight to a quantization levels wherein level of quantization positive weight (correspond to maximum value) wherein system use computer which comprised chip),
and wherein a number of quantization levels for the first set of weights is different from a number of quantization levels for the second set of weights (Page 7875 Section 3.1. In-parallel pruning-quantization “We place two “clips”, scalars c− and c+, such that (p × 100)% of the positive weights in the layer are less than or equal to c+” and Page 7876 Section 3.1. In-parallel pruning-quantization “We then quantize the weights by setting them to the new quantization levels in the next forward pass” and Page 7873 Section 1 Introduction “many practical applications of computer vision require efficient solutions with low memory and energy footprint……Our focus in this paper is on deep network compression, which has the goal of making deep networks more compact” and figure 1 teaches deep neural networks where quantization level for weights are positive and negative (correspond to different)). 
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by Tung et al. to the disclosed invention of Ovtcharov et al. in view of Alippi et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “CLIP-Q improves on the previous best compression rate (37.5 ×) by 35% relative. On GoogLeNet, CLIP-Q obtains a state-of-the-art compressed network size of 2.8 MB, improving on the previous best compression rate (6.4×) by 57% relative. On ResNet-50, CLIP-Q obtains a 139% relative improvement in the state-of-the-art compression rate while obtaining 2.4% higher accuracy” (Alippi, Page 7880 Section 4.4. Comparison to State-of-the-Art Methods).
Regarding claim 7.
Ovtcharov et al. in view of Alippi et al. further in view of Tung et al. teaches The system of claim 6. 
Ovtcharov et al. further teaches further comprising programming instructions configured to (Page 7 Para [0087] “A computing device, comprising: one or more processors; and at least one computer storage media having computer-executable instructions stored thereupon which, when executed by the one or more processors, will cause the computing device to” teaches processor and computer storage media contain instruction) 
Tung et al. further teaches update the quantized weights of the AI model so that output of the AI model based at least on the updated weights are within a range of ground truth of the training data set (Page 7876 Section 3.1. In-parallel pruning-quantization “We then quantize the weights by setting them to the new quantization levels in the next forward pass” and Page 7876 Section 3.2. Pruning-Quantization Hyperparameter Prediction “To guide our search for promising pruning-quantization hyperparameters, we optimize minθε(θ)−λ⋅ci(θ) for each layer i. We obtain a coarse estimate of the quality of θ by testing the compressed network on a subset of the training data; ε(θ) is the resulting top-1 error” figure 1 teaches quantized weight of deep neural network which updated accordingly minimizing error).
The same motivation to combine as independent claim 6 applies here.
Regarding claim 8. 
Ovtcharov et al. in view of Alippi et al. further in view of Tung et al. teaches The system of claim 7, 
Ovtcharov et al. further teaches wherein the programming instructions for updating the quantized weights of the AI model further comprise programming instructions configured to (Page 7 Para [0087] “A computing device, comprising: one or more processors; and at least one computer storage media having computer-executable instructions stored thereupon which, when executed by the one or more processors, will cause the computing device to: during a forward training pass of an artificial neural network (ANN)” teaches processor and computer storage media contain instruction),
repeat in one or more iterations, until a stopping criteria is met, operations comprising (Page 5 Para [0060] “At operation 514, the ANN training module 106 determines whether training is complete. If not, the routine 500 proceeds back to operation 502, where the process described above can be repeated until optimal bit widths for quantizing weights 110 and activation values 208 are learned” and Figure 5 teaches operation, repeat iteration until optimal bit width for quantizing weights are learned (correspond to stopping criteria)):
determining second output of the AI model based on the quantized weights of the AI model and the training data set (Page 2-3 Para [0028] “The training data set 108 includes samples (e.g. images) for applying to an ANN and data describing a desired output from the ANN for each respective sample in the training data set 108” and Page 7 Para [0087] “compute a gradient for the first bit width and a gradient for the second bit width, compute a new first bit width for quantizing the weights for the layer of the ANN based on the first gradient, and compute a new second bit width for quantizing the activation values input to the layer of the ANN based on the second gradient” and figure 5 teaches receive output of the ANN based on the quantized weight and training data set);
quantizing the second output of the AI model (Page 7 Para [0087] “compute a new second bit width for quantizing the activation values input to the layer of the ANN based on the second gradient” and Figure 5 teaches quantizing the activation values input to the layer (correspond to output) of the ANN);
determining a change of weights based on the quantized output of the AI model (Page 7 Para [0087] “compute a gradient for the first bit width and a gradient for the second bit width, compute a new first bit width for quantizing the weights for the layer of the ANN based on the first gradient, and compute a new second bit width for quantizing the activation values input to the layer of the ANN based on the second gradient” and Figure 5 teaches determine gradient (correspond to change) of weight based on quantized weight (correspond to output) of the ANN ); 
and updating the quantized weights of the AI model based on the change of weights (Page 3 Para [0031] “each layer of the ANN computes the error for the previous layer and the gradients, or updates, to the weights 110 of the layer that move the ANN's prediction toward the desired output” and Page 7 Para [0087] “compute a gradient for the first bit width and a gradient for the second bit width, compute a new first bit width for quantizing the weights for the layer of the ANN based on the first gradient” teaches update quantized weights of the ANN based on the gradient (correspond to change) of weight).
Regarding claim 9. 
Ovtcharov et al. in view of Alippi et al. further in view of Tung et al. teaches The system of claim 8, 
Tung et al. further teaches wherein the operation of updating the quantized weights of the AI model comprises operations comprising updating the first set of weights and not updating the second set of weights (Page 7875 Section 3.1. In-parallel pruning-quantization “We place two “clips”, scalars c− and c+, such that (p × 100)% of the positive weights in the layer are less than or equal to c+, , and (p×100)% of the negative weights are greater than or equal to c−. All the weights between c− and c+ are set to zero in the next forward pass. This removes the corresponding connections from the network when processing the next minibatch. Note that this pruning decision is impermanent: in the next iteration, we apply the rule again on updated weights” and Page 7876 Section 3.1. In-parallel pruning-quantization “We then quantize the weights by setting them to the new quantization levels in the next forward pass” teaches quantized weights wherein operation comprising weights between c− and c+ updated (correspond to first set of weight)for the minibatch and for next mini batch set weight to zero (correspond to not updating)),
wherein the number of quantization levels for the first convolution layer is lower than the number of quantization levels for the second convolution layer (Page 7875 Section 3.1. In-parallel pruning-quantization “We place two “clips”, scalars c− and c+, such that (p × 100)% of the positive weights in the layer are less than or equal to c+” and Page 7876 Section 3.1. In-parallel pruning-quantization “We then quantize the weights by setting them to the new quantization levels in the next forward pass” and figure 1 teaches quantization of weight for the layer is c- (correspond to lower)).
The same motivation to combine as dependent claim 8 applies here.
Regarding claim 10. 
Ovtcharov et al. in view of Alippi et al. further in view of Tung et al. teaches The system of claim 9, 
Ovtcharov et al. further teaches wherein the programming instructions for determining the change of weights of the AI model further comprise programming instructions configured to use a gradient descent method (Page 4-5 Para [0051] “with reference to the simplified topology of the example ANN shown in FIG. 3. As shown in FIG. 4, the computed value 404 for the loss function 304 can be used during a backward training pass (i.e. backpropagation) to compute a gradient 402A for the bit width 308A used by the quantizing function 306 when quantizing weights 110” and Page 7 Para [0087] “A computing device, comprising: one or more processors; and at least one computer storage media having computer-executable instructions stored thereupon which, when executed by the one or more processors, will cause the computing device to: during a forward training pass of an artificial neural network (ANN)” teaches instruction of ANN for quantization (correspond to change) weight to compute using gradient), wherein a loss function in the gradient descent method is based on a sum of loss values over a plurality of training instances in the training data set (Page 4 Para [0051] “with reference to the simplified topology of the example ANN shown in FIG. 3. As shown in FIG. 4, the computed value 404 for the loss function 304 can be used during a backward training pass (i.e. backpropagation) to compute a gradient 402A for the bit width 308A used by the quantizing function 306 when quantizing weights 110” and figure 3 teaches gradient method comprising loss function and loss function is loss over the training data),
Tung et al. further teaches wherein the loss value of each of the plurality of training instances is a difference between the quantized output of the AI model for the training instance and a ground truth of the training instance (Page 3 Para [0030] “Based on the label predicted by the ANN and the label associated with each instance of training data in the training data set 108, the output layer computes a “loss,” or error function” and Page 4 Para [0050] “The quantizing function 306 can also be executed during training to quantize activation values 208 for the layer of the ANN using the bit width 308B. The value of a loss function 304 can be computed at the end of each forward training pass of the ANN” and Fig. 3 teaches loss value computed difference between the output of the quantized weight and training data set (correspond to ground truth)).
The same motivation to combine as dependent claim 9 applies here.
Regarding claim 11. 
Ovtcharov et al. in view of Alippi et al. further in view of Tung et al. teaches The system of claim 6, 
Ovtcharov et al. further teaches wherein the AI chip is configured to (Page 5 Para [0063] “FIG. 6 includes one or more central processing units 602 (“CPU”)” teaches processing unit CPU wherein CPU comprising chip): 
execute the AI task to generate output of the AI task, wherein the quantized weights of the AI model are uploaded into the AI chip (Page 2 Para [0025] “FIGS. 1-2B. As described briefly above, ANNs are applied to a number of applications in AI and ML including….classification and AI tasks” and Page 6 Para [0067] “the input/output controller 618 can provide output to a display screen or other type of output device” and Page 7 Para [0079] “executing a quantizing function to quantize weights for a layer of the ANN using a first bit width, and executing the quantizing function to quantize activation values input to the layer of the ANN using a second bit width”  teaches provide output to the system wherein system perform AI task wherein task performed quantized weight); 
and present the output of the AI task on an output device (Page 2 Para [0025] “FIGS. 1-2B. As described briefly above, ANNs are applied to a number of applications in AI and ML including….classification and AI tasks” and Page 6 Para [0067] “the input/output controller 618 can provide output to a display screen or other type of output device” teaches present output of the task in output device).
Regarding claim 12. 
Ovtcharov et al. in view of Alippi et al. further in view of Tung et al. teaches The system of claim 6, 
Ovtcharov et al. further teaches wherein the programming instructions for quantizing the weights of the AI model further comprise programming instructions configured to (Page 7 Para [0087] “A computing device, comprising: one or more processors; and at least one computer storage media having computer-executable instructions stored thereupon which, when executed by the one or more processors, will cause the computing device to: during a forward training pass of an artificial neural network (ANN)” teaches processor and computer storage media contain instruction)
perform, in one or more iterations until a stopping criteria is met, operations comprising (Page 5 Para [0060] “At operation 514, the ANN training module 106 determines whether training is complete. If not, the routine 500 proceeds back to operation 502, where the process described above can be repeated until optimal bit widths for quantizing weights 110 and activation values 208 are learned” and Figure 5 teaches operation, repeat iteration unit optimal bit width for quantizing weights are learned (correspond to stopping criteria)):
quantizing weights of one or more convolution layers of the AI model (Page 5 Para [0060] “At operation 514, the ANN training module 106 determines whether training is complete….. repeated until optimal bit widths for quantizing weights 110 and activation values 208 are learned” teaches quantizing weights of the ANN);
determining output of the one or more convolution layers of the AI model based on the quantized weights of the AI model and the training data set (Page 2-3 Para [0028] “The training data set 108 includes samples (e.g. images) for applying to an ANN and data describing a desired output from the ANN for each respective sample in the training data set 108” and Page 7 Para [0087] “compute a gradient for the first bit width and a gradient for the second bit width, compute a new first bit width for quantizing the weights for the layer of the ANN based on the first gradient, and compute a new second bit width for quantizing the activation values input to the layer of the ANN based on the second gradient” and figure 5 teaches receive output of the ANN based on the quantized weight and training data set);
determining a change of weights based on the output of the one or more convolution layers of the AI model (Page 7 Para [0087] “compute a gradient for the first bit width and a gradient for the second bit width, compute a new first bit width for quantizing the weights for the layer of the ANN based on the first gradient, and compute a new second bit width for quantizing the activation values input to the layer of the ANN based on the second gradient” and Page 2 Para [0013] “It should be noted that applications of the disclosed herein can be used with various types of neural networks, such as convolutional neural networks (“CNNs”)… or other suitable ANNs that can be adapted to use the technologies disclosed herein” and Figure 5 teaches determine gradient (correspond to change) of weight based on quantized weight (correspond to output) of the ANN wherein ANN comprising convolution layer); 
Tung et al. further teaches and updating the weights of the one or more convolution layers of the AI model based on the change of weights, wherein updating the weights comprises at least updating the first set of weights and not updating the second set of weights (Page 7875 Section 3.1. In-parallel pruning-quantization “We place two “clips”, scalars c− and c+, such that (p × 100)% of the positive weights in the layer are less than or equal to c+, , and (p×100)% of the negative weights are greater than or equal to c−. All the weights between c− and c+ are set to zero in the next forward pass. This removes the corresponding connections from the network when processing the next minibatch. Note that this pruning decision is impermanent: in the next iteration, we apply the rule again on updated weights” and Page 7876 Section 3.1. In-parallel pruning-quantization “We then quantize the weights by setting them to the new quantization levels in the next forward pass” teaches quantized weights wherein operation comprising weights between c− and c+ updated (correspond to first set of weight) for the minibatch and for next mini batch set weight to zero (correspond to not updating)), 
wherein the number of quantization levels for the first convolution layer is lower than the number of quantization levels for the second convolution layer (Page 7875 Section 3.1. In-parallel pruning-quantization “We place two “clips”, scalars c− and c+, such that (p × 100)% of the positive weights in the layer are less than or equal to c+” and Page 7876 Section 3.1. In-parallel pruning-quantization “We then quantize the weights by setting them to the new quantization levels in the next forward pass” and figure 1 teaches quantization of weight for the layer is c- (correspond to lower)).
The same motivation to combine as independent claim 6 applies here.
Regarding claim 18. 
Ovtcharov et al. in view of Alippi et al. teaches The method of claim 13 further comprising:
Ovtcharov et al. further teaches training weights of the CNN based at least on a training data set, wherein the trained weights of the CNN are stored in floating point (Page 3 Para [0038] “Performance, energy usage, and storage requirements of ANNs can be improved through the use of quantized-precision floating-point formats during training and/or inference. In particular, weights 110 and activation values 208 (shown in FIGS. 2A and 2B, respectively) can be represented in a lower-precision quantized-precision floating-point format, which typically results in some amount of error being introduced” and Page 3 Para [0032] “The ANN then uses the weights 110 (and biases) obtained during training 102 to perform classification, recognition, or other types of tasks on samples in an input data set 114, typically samples that were not used during training” and Page 4 Para [0048] “which shows a forward training pass for a portion of an example ANN that includes a convolution layer 302” and Page 2 Para [0013] “It should be noted that applications of the disclosed herein can be used with various types of neural networks, such as convolutional neural networks (“CNNs”)… or other suitable ANNs that can be adapted to use the technologies disclosed herein” and Fig. 1  teaches training weight of ANN from the training data set and training weight of the ANN are representation of floating point wherein representation of floating point are store, ANN comprising convolution neural network); 
and uploading the updated weights of the CNN to the semiconductor device for performing the AI task (Page 2 Para [0027] “An ANN generally consists of a sequence of layers of different types (e.g. convolution, ReLU, fully connected, and pooling layers)” and Page 3 Para [0031] “the ANN, each layer of the ANN computes the error for the previous layer and the gradients, or updates, to the weights 110 of the layer that move the ANN's prediction toward the desired output” and Page 6 Para [0069] “the computer readable media is implemented as semiconductor-based memory” teaches weight updated and upload convolution layer of the ANN where in ANN is CNN of semiconductor).
Alippi et al. further teaches wherein the quantized weights are stored in fixed point (Page 215 Section 3 3.0.2 Precision Scaling “precision scaling aims at reducing the memory occupation associated with the weights of Φ by considering approximated versions θ1~ of θ1….Precision scaling, which is computed through the rounding of weights θis, aims at employing a fixed-point representations of the weights so as to consider 16-bit or 8-bit data types (e.g., int or short int)” and Page 2016 Section 4.1 Computational Complexity and Memory Occupation “The memory reduction when considering Φ instead of Φ depends explicitly on k and implicitly on q and can be quantified as ΔM=MΦ−MΦ¯¯¯¯=NΦ~(b−b~)+(NΦ−NΦ~)b.(6)”  teach quantized weight are represented as fixed point and weight are stored);
The same motivation to combine as independent claim 13 applies here.
Ovtcharov et al. in view of Alippi et al. does not teach quantizing the trained weights of the CNN to a respective number of quantization levels corresponding to a maximum value of a convolution layer of an AI chip… updating the quantized weights of the CNN so that output of the CNN based on the updated weights is within a range of ground truth of the training data set.
Tung et al. further teaches quantizing the trained weights of the CNN to a respective number of quantization levels corresponding to a maximum value of a convolution layer of an AI chip (Page 7875 Section 3.1. In-parallel pruning-quantization “We place two “clips”, scalars c− and c+, such that (p × 100)% of the positive weights in the layer are less than or equal to c+” and Page 7876 Section 3.1. In-parallel pruning-quantization “We then quantize the weights by setting them to the new quantization levels in the next forward pass” and Page 7873 Section 1 Introduction “many practical applications of computer vision require efficient solutions with low memory and energy footprint……Our focus in this paper is on deep network compression, which has the goal of making deep networks more compact” and figure 1 teaches deep neural network of quantized weight to a quantization levels wherein level of quantization positive weight (correspond to maximum value) wherein system use computer which comprised chip),
updating the quantized weights of the CNN so that output of the CNN based on the updated weights is within a range of ground truth of the training data set ((age 7876 Section 3.1. In-parallel pruning-quantization “We then quantize the weights by setting them to the new quantization levels in the next forward pass” and Page 7876 Section 3.2. Pruning-Quantization Hyperparameter Prediction “To guide our search for promising pruning-quantization hyperparameters, we optimize minθε(θ)−λ⋅ci(θ) for each layer i. We obtain a coarse estimate of the quality of θ by testing the compressed network on a subset of the training data; ε(θ) is the resulting top-1 error” figure 1 teaches quantized weight of deep neural network which updated accordingly minimizing error); 
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by Tung et al. to the disclosed invention of Ovtcharov et al. in view of Alippi et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “CLIP-Q improves on the previous best compression rate (37.5 ×) by 35% relative. On GoogLeNet, CLIP-Q obtains a state-of-the-art compressed network size of 2.8 MB, improving on the previous best compression rate (6.4×) by 57% relative. On ResNet-50, CLIP-Q obtains a 139% relative improvement in the state-of-the-art compression rate while obtaining 2.4% higher accuracy” (Tung, Page 7880 Section 4.4. Comparison to State-of-the-Art Methods).
Regarding claim 19. 
Ovtcharov et al. teaches A semiconductor comprising (Page 6 Para [0069] “the computer readable media is implemented as semiconductor-based memory” teaches semiconductor):
a memory (Page 6 Para [0069] “the computer readable media is implemented as semiconductor-based memory” teaches memory); 
and an embedded convolution neural network (CNN) comprising a plurality of weights in a plurality of convolution layers, the CNN is configured to (Page 5 Para [0062] “computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, computing or processing systems embedded in devices (such as wearable computing devices, automobiles, home automation etc.)” and Page 2 Para [0087] “when executed by the one or more processors, will cause the computing device to: during a forward training pass of an artificial neural network (ANN), execute a quantizing function to quantize weights for a layer of the ANN using a first bit width” and  Page 2 Para [0013] “It should be noted that applications of the disclosed herein can be used with various types of neural networks, such as convolutional neural networks (“CNNs”)” teaches computing device is emended wherein computing device implement CNN and execute weight for neural network)
perform an artificial intelligence (AI) task based on input data and the plurality of weights in the plurality of convolution layers of the CNN (Page 2 Para [0025] “FIGS. 1-2B. As described briefly above, ANNs are applied to a number of applications in AI and ML including….classification and AI tasks” Page 3 Para [0032] “The ANN then uses the weights 110 (and biases) obtained during training 102 to perform classification, recognition, or other types of tasks on samples in an input data set 114” Page 2 Para [0013] “It should be noted that applications of the disclosed herein can be used with various types of neural networks, such as convolutional neural networks (“CNNs”)… or other suitable ANNs that can be adapted to use the technologies disclosed herein” teaches artificial intelligence task based on input data and weight in the ANNs wherein ANN is CNN ); 
and provide output of the AI task (Page 2 Para [0025] “FIGS. 1-2B. As described briefly above, ANNs are applied to a number of applications in AI and ML including….classification and AI tasks” and Page 6 Para [0067] “the input/output controller 618 can provide output to a display screen or other type of output device” teaches provide output to the system wherein system perform AI task);
wherein at least a portion of the plurality of weights are obtained in a training system configured to (Page 3 Para [0032] “The ANN then uses the weights 110 (and biases) obtained during training 102 to perform classification, recognition, or other types of tasks on samples in an input data set 114, typically samples that were not used during training” teaches weights are obtained during training):
train weights of the CNN based at least on a training data set, wherein the trained weights of the CNN are stored in floating point (Page 3 Para [0038] “Performance, energy usage, and storage requirements of ANNs can be improved through the use of quantized-precision floating-point formats during training and/or inference. In particular, weights 110 and activation values 208 (shown in FIGS. 2A and 2B, respectively) can be represented in a lower-precision quantized-precision floating-point format, which typically results in some amount of error being introduced” and Page 3 Para [0032] “The ANN then uses the weights 110 (and biases) obtained during training 102 to perform classification, recognition, or other types of tasks on samples in an input data set 114, typically samples that were not used during training” and Page 4 Para [0048] “which shows a forward training pass for a portion of an example ANN that includes a convolution layer 302” and Fig. 1  teaches training weight of ANN from the training data set and training weight of the ANN are representation of floating point wherein representation of floating point are store, ANN comprising convolution neural network); 
and upload the updated weights to the plurality of convolution layers of the embedded CNN of the semiconductor (Page 2 Para [0027] “An ANN generally consists of a sequence of layers of different types (e.g. convolution, ReLU, fully connected, and pooling layers)” and Page 3 Para [0031] “the ANN, each layer of the ANN computes the error for the previous layer and the gradients, or updates, to the weights 110 of the layer that move the ANN's prediction toward the desired output” and Page 6 Para [0069] “the computer readable media is implemented as semiconductor-based memory” teaches weight updated and upload convolution layer of the ANN where in ANN is CNN of semiconductor).
Ovtcharov et al. does not teaches wherein the quantized weights are stored in fixed point. 
However, Alippi et al. teaches wherein the quantized weights are stored in fixed point (Page 215 Section 3 3.0.2 Precision Scaling “precision scaling aims at reducing the memory occupation associated with the weights of Φ by considering approximated versions θ1~ of θ1….Precision scaling, which is computed through the rounding of weights θis, aims at employing a fixed-point representations of the weights so as to consider 16-bit or 8-bit data types (e.g., int or short int)” and Page 2016 Section 4.1 Computational Complexity and Memory Occupation “The memory reduction when considering Φ instead of Φ depends explicitly on k and implicitly on q and can be quantified as ΔM=MΦ−MΦ¯¯¯¯=NΦ~(b−b~)+(NΦ−NΦ~)b.(6)”  teach quantized weight are represented as fixed point and weight are stored);
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by Alippi et al. to the disclosed invention of Ovtcharov et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “A further level of approximation for Φ~ is introduced in Section 4.3 to consider, in the k-th layer, only the convolutional filters useful for the application-specific classification problem; this allows to further reduce computational load and memory occupation of Φ~” (Alippi, Page 216 Section 4 Designing Approximated Convolutional Neural Networks).
Ovtcharov et al. in view of Alippi et al. does not teaches quantize the trained weights of the CNN to a respective number of quantization levels corresponding to a maximum value of a convolution layer of the CNN…… update the quantized weights of the CNN so that output of the CNN based on the updated weights is within a range of ground truth of the training data set.
However, Tung et al. teaches quantize the trained weights of the CNN to a respective number of quantization levels corresponding to a maximum value of a convolution layer of the CNN (Page 7875 Section 3.1. In-parallel pruning-quantization “We place two “clips”, scalars c− and c+, such that (p × 100)% of the positive weights in the layer are less than or equal to c+” and Page 7876 Section 3.1. In-parallel pruning-quantization “We then quantize the weights by setting them to the new quantization levels in the next forward pass” and figure 1 teaches deep neural network of quantized weight to a quantization levels wherein level of quantization positive weight (correspond to maximum value)), 
update the quantized weights of the CNN so that output of the CNN based on the updated weights is within a range of ground truth of the training data set (Page 7876 Section 3.1. In-parallel pruning-quantization “We then quantize the weights by setting them to the new quantization levels in the next forward pass” and Page 7876 Section 3.2. Pruning-Quantization Hyperparameter Prediction “To guide our search for promising pruning-quantization hyperparameters, we optimize minθε(θ)−λ⋅ci(θ) for each layer i. We obtain a coarse estimate of the quality of θ by testing the compressed network on a subset of the training data; ε(θ) is the resulting top-1 error” figure 1 teaches quantized weight of deep neural network which updated accordingly minimizing error)
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by Tung et al. to the disclosed invention of Ovtcharov et al. in view of Alippi et al. 
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “CLIP-Q improves on the previous best compression rate (37.5 ×) by 35% relative. On GoogLeNet, CLIP-Q obtains a state-of-the-art compressed network size of 2.8 MB, improving on the previous best compression rate (6.4×) by 57% relative. On ResNet-50, CLIP-Q obtains a 139% relative improvement in the state-of-the-art compression rate while obtaining 2.4% higher accuracy” (Tung, Page 7880 Section 4.4. Comparison to State-of-the-Art Methods).
Claims 5 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Ovtcharov et al. (US 20200302269 A1) in view of Alippi et al. (“Moving Convolutional Neural Networks to Embedded Systems: The AlexNet and VGG-16 Case”) further in view of Hsieh et al. (US 20180144214 A1).
Regarding claim 5. 
Ovtcharov et al. in view of Alippi et al. teaches The semiconductor of claim 4, 
Ovtcharov et al. teaches wherein the CNN is configured to perform the AI task by (Page 2 Para [0025] “ANNs are applied to a number of applications in AI and ML including… AI tasks” and Page 2 Para [0013] “convolutional neural networks (“CNNs”)… other suitable ANNs that can be adapted to use the technologies” teaches ANN perform the AI task wherein ANN comprising CNN):
Ovtcharov et al. in view of Alippi et al. does not teaches generating feature descriptors of the image data; comparing the feature descriptors of the image data with reference feature descriptors; and generating the output of the AI task based on the comparing. 
However, Hsieh et al. teaches generating feature descriptors of the image data (Page 18-19 Para [0207] “The classifier 2150 provides weighted features that can be used to generate a known image quality index 2160” teaches generate features of the image);
comparing the feature descriptors of the image data with reference feature descriptors (Page 7 Para [0095] “a plurality of inputs are provided to the network, and output is generated. At block 824, the deep learning network model is evaluated. For example, output of the network is compared against known/reference output for those inputs” teaches compared the output (correspond to feature descriptor of image data) with reference output (correspond to feature descriptor)); 
and generating the output of the AI task based on the comparing (Page 7 Para [0095] “a plurality of inputs are provided to the network, and output is generated. At block 824, the deep learning network model is evaluated. For example, output of the network is compared against known/reference output for those inputs…..At block 826, the output is evaluated to determine whether the network has successfully modeled the expected output. If the network has not, then the training process continues at block 822. If the network has successfully modeled the output, then, at block 828, a deep learning model-based device is generated” teach output is evaluated based on comparing and provide whether network has successfully modeled the output (correspond to generating the output)).
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate above limitation as taught by Hsieh et al. to the disclosed invention of Ovtcharov et al. in view of Alippi et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “The diagnosis engine 1450 can employ a deep learning network, such as a CNN, RNN, etc., to help improve consistency, standardization, and accuracy of diagnoses. Additional data, such as non-image data, can be included in the deep learning network by the diagnosis engine 1450 to provide a holistic approach to patient diagnosis” and “output of the network is compared against known/reference output for those inputs. As the network makes connections and learns, the accuracy of the network model improves.” (Hsieh Pg. 10 Para [0123], Pg. 7 [0095]).  
Regarding claim 17. 
Ovtcharov et al. in view of Alippi et al. teaches The method of claim 13, 
Ovtcharov et al. further teaches wherein the input data is image data captured from an image capturing device, and wherein performing the AI task comprises (Page 2-3 Para [0028] “Training 102 of ANNs typically utilizes a training data set 108. The training data set 108 includes samples (e.g. images) for applying to an ANN” and Page 2 Para [0025] “FIGS. 1-2B. As described briefly above, ANNs are applied to a number of applications in AI and ML including….classification and AI tasks” teaches sample is image data capture and AI task performed):
Ovtcharov et al. in view of Alippi et al. does not teaches generating feature descriptors of the image data; comparing the feature descriptors of the image data with reference feature descriptors; and generating the output of the AI task based on the comparing.
However, Hsieh et al. teaches generating feature descriptors of the image data (Page 18-19 Para [0207] “The classifier 2150 provides weighted features that can be used to generate a known image quality index 2160” teaches generate features of the image);
comparing the feature descriptors of the image data with reference feature descriptors (Page 7 Para [0095] “a plurality of inputs are provided to the network, and output is generated. At block 824, the deep learning network model is evaluated. For example, output of the network is compared against known/reference output for those inputs” teaches compared the output (correspond to feature descriptor of image data) with reference output (correspond to feature descriptor)); 
generating the output of the AI task based on the comparing (Page 7 Para [0095] “a plurality of inputs are provided to the network, and output is generated. At block 824, the deep learning network model is evaluated. For example, output of the network is compared against known/reference output for those inputs…..At block 826, the output is evaluated to determine whether the network has successfully modeled the expected output. If the network has not, then the training process continues at block 822. If the network has successfully modeled the output, then, at block 828, a deep learning model-based device is generated” teach output is evaluated based on comparing and provide whether network has successfully modeled the output (correspond to generating the output)).
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate above limitation as taught by Hsieh et al. to the disclosed invention of Ovtcharov et al. in view of Alippi et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “The diagnosis engine 1450 can employ a deep learning network, such as a CNN, RNN, etc., to help improve consistency, standardization, and accuracy of diagnoses. Additional data, such as non-image data, can be included in the deep learning network by the diagnosis engine 1450 to provide a holistic approach to patient diagnosis” and “output of the network is compared against known/reference output for those inputs. As the network makes connections and learns, the accuracy of the network model improves.” (Hsieh Pg. 10 Para [0123], Pg. 7 [0095]).  
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Ovtcharov et al. (US 20200302269 A1) in view of Alippi et al.  (“Moving Convolutional Neural Networks to Embedded Systems: The AlexNet and VGG-16 Case”) further in view of Tung et al.  (“CLIP-Q: Deep Network Compression Learning by In-parallel Pruning-Quantization”) and further in view of Hsieh et al. (US 20180144214 A1). 
Regarding claim 20. 
Ovtcharov et al. in view of Alippi et al. further in view of Tung et al. teaches The semiconductor of claim 19, 
Ovtcharov et al. further teaches wherein the embedded CNN is configured to perform the AI task by (Page 5 Para [0062] “computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, computing or processing systems embedded in devices (such as wearable computing devices, automobiles, home automation etc.)” and Page 2 Para [0087] “when executed by the one or more processors, will cause the computing device to: during a forward training pass of an artificial neural network (ANN)” and  Page 2 Para [0013] “It should be noted that applications of the disclosed herein can be used with various types of neural networks, such as convolutional neural networks (“CNNs”)” and Page 6 Para [0069] “if the computer readable media is implemented as semiconductor-based memory” and Page 3 Para [0032] “The ANN then uses the weights 110 (and biases) obtained during training 102 to perform classification, recognition, or other types of tasks on samples in an input data set 114” teaches computing device is emended wherein computing device implement CNN  to perform classification, recognition, or other types (correspond to AI) of task):
Ovtcharov et al. in view of Alippi et al. further in view of Tung et al. does not teaches generating feature descriptors of input image data based on the plurality of weights; comparing the feature descriptors of the image data with reference feature descriptors; and generating the output of the AI task based on the comparing.
However, Hsieh et al. teaches generating feature descriptors of input image data based on the plurality of weights (Page 18-19 Para [0207] “The classifier 2150 provides weighted features that can be used to generate a known image quality index 2160” teaches generate features of the image based on weights);
comparing the feature descriptors of the image data with reference feature descriptors (Page 7 Para [0095] “a plurality of inputs are provided to the network, and output is generated. At block 824, the deep learning network model is evaluated. For example, output of the network is compared against known/reference output for those inputs” teaches compared the output (correspond to feature descriptor of image data) with reference output (correspond to feature descriptor)); 
and generating the output of the AI task based on the comparing (Page 7 Para [0095] “a plurality of inputs are provided to the network, and output is generated. At block 824, the deep learning network model is evaluated. For example, output of the network is compared against known/reference output for those inputs…..At block 826, the output is evaluated to determine whether the network has successfully modeled the expected output. If the network has not, then the training process continues at block 822. If the network has successfully modeled the output, then, at block 828, a deep learning model-based device is generated” teach output is evaluated based on comparing and provide whether network has successfully modeled the output (correspond to generating the output)).
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate above limitation as taught by Hsieh et al. to the disclosed invention of Ovtcharov et al. in view of Alippi et al. further in view of Tung et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “The diagnosis engine 1450 can employ a deep learning network, such as a CNN, RNN, etc., to help improve consistency, standardization, and accuracy of diagnoses. Additional data, such as non-image data, can be included in the deep learning network by the diagnosis engine 1450 to provide a holistic approach to patient diagnosis” and “output of the network is compared against known/reference output for those inputs. As the network makes connections and learns, the accuracy of the network model improves.” (Hsieh Pg. 10 Para [0123], Pg. 7 [0095]).  
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LOKESHA G PATEL whose telephone number is (571)272-6267. The examiner can normally be reached Monday-Friday 8am-5pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Afshar, Kamran can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/LOKESHA G PATEL/Examiner, Art Unit 2125       


/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125