DETAILED ACTION
This action is responsive to the Application filed on 09/15/2022. Claims 1-35 are pending in the case.  Claims 1, 9, 17, 25, 27, 29, 33 are independent claims. Claims 1, 9, 17, 25, 27, 29, 33 and 35 are amended.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed 02/01/2018 have been fully considered but they are not persuasive.
Regarding 35 U.S.C. 101
	Applicant makes many arguments relating the present case to Court cases and decisions. However, the instant application’s eligibility is not based on the application’s similarity to other 
application. Examiner notes that, The July 2015 Update and the May 2016 Memo have been superseded by the 2019 PEG, reflected in the current version of the MPEP, and under the current guidance, claims have been shown as ineligible.
	Further Applicant argues that the claims do not recite any abstract ideas. Applicant argues “claimed features are not ... practically performed in the human mind ... since the claims specifically recite that the processes are a neural network processor implemented method”. Whether or not a claim step is implemented on a “neural network processor” does not indicate whether a claim step is an abstract idea. According to the analysis a limitation that can be performed in the human mind (i.e extracting at least one bit) is first identified. Then in step 2A and 2B the additional claim limitations are evaluated to determine if they amount to a practical application or significantly more. Implementing an abstract idea on a generic processor does not demonstrate this. Therefore, the rejection is maintained. 
	Applicant argues that the claims recite an improvement to the field. Examiner disagrees. The claim steps demonstrate an improvement to an abstract idea, the judicial exception alone cannot provide the improvement. The improvement can be provided by one or more additional elements. The claims do not demonstrate and improvement to the neural network technology.
	Applicant argues that claims amount to significantly more by comparison to other court cases. Again as stated above, eligibility of claims are based on their own merits. Applicant notes that the Office has provided basis as to how the claims do not provide “significantly more” than the recited judicial exception. However, examiner has provided the required evidence by pointing out which elements are abstract ideas, and explaining the additional elements present in the claims which correspond to claim elements delineated in MPEP 2106 which are not either indicative of integration into a practical application or an inventive concept.  

Regarding 35 U.S.C. 102 rejection of claim 1
	Applicant argues that Narang does not extract at least one bit, rather Narang merely teaches that weights activation and gradients are stored in half precision floating point format. Examiner disagrees weights for example are not merely stored, they are extracted from high precision master weights by reducing the weight to half precision format. Than these extracted weights are used to executed in a forward activation to generate results as described in Narang and the cited portion in the rejection. In particular producing a FP16 copy from a FP32 weight set is understood to mean extracting from a high precision weight a lower precision copy.
Regarding 35 U.S.C. 103 rejection of claim 33 and 35
	Applicant argues that Kim does not teach the claim elements. Stating that cited portions of Kim merely teach that a data set consists of training data validation and test data. Examiner disagrees. The cited portions of Kim explicitly describe steps of a neural network in relation to figure 1 of Kim. Figure 1 of Kim depicts at least 2 separate neural networks for solving different tasks. The rejection clearly makes this point as the examiner noted and described the role of the two separate neural networks which correspond to the claim on pg 31 of the Non-final office action.
Regarding 35 U.S.C. 103 rejection of claim 17, 18, 23, 29-31
	Applicant argues that the cited art does not teach the claim elements. Specifically applicant notes that Narang does not teach the limitation “quantizing weights of a high bit-width….and applying input data to the first layer…”. Examiner notes that Weights are applied to the first layer of a neural network in a forward propagation in Narang As highlighted in the 103 combination, the weight gradient which is used to update weights in a neural network is based on the Loss values of a neural network. Further reducing the precision of 32 bit master weights to half precision amounts to quantizing as claimed

Other(s) of applicant’s arguments filed 02/01/2018 have been fully considered and are persuasive. Consequently the previous 112b rejection has been withdrawn, however the amendments have resulted in different 112b rejections

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claim 1-35 are rejected under 35 U.S.C. 101 because 
Regarding Claim 1
Under step 1, the claim is directed to a method for A neural network processor-implemented method, which is directed to a process, one of the statutory categories. The claim recites the following limitations which are considered abstract ideas “extracting at least one bit, corresponding to a determined bit-width, from each of  first weights of a first layer of a source model corresponding to a first layer of a neural network.” 
Under Step 2A Prong 1, the cited abstract ideas correspond to an evaluation performed in the human mind. extracting at least one bit, corresponding to a determined bit-width, from each of  first weights of a first layer of a source model corresponding to a first layer of a neural network is an evaluation preformed in the human mind, for example observing a weight value of 150.5 and extracting a rounded version of the weight value as 100 can be performed by mental evaluation. The fact that the extraction is performed on values from a source model does not make the step of ‘extracting’ not abstract.
Furthermore under step 2A Prong 2 and 2B the claims the additional element “generating results of the first layer of the neural network, having second weights based on the extracting, by providing input data to the first layer of the neural network having the second weights.” amounts to general linking the abstract idea to a particular technological environment, in the case neural network execution. Simply generating results using a neural network is an operation performed by a generalized processor. There is no requirement of a specific processor or any improvement to the technological environment. Accordingly, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea, nor do they amount to significantly more than the judicial exception because they do not impose any meaningful limits on practicing the abstract idea.
Regarding Claim 2
The claim is directed to a process. The claim recites the following limitations “wherein the first weights are configured to have a higher bit-precision than the second weights.” Under Step 2A Prong 1, these limitations only serve to describe the abstract idea addressed in the independent claim.
Furthermore under step 2A Prong 2 and 2B, the claim does not recite additional elements to consider other than those considered in the independent claim. Accordingly, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea, nor do they amount to significantly more than the judicial exception because they do not impose any meaningful limits on practicing the abstract idea.
Regarding Claim 3
The claim is directed to a process. The claim recites the following limitations “wherein the second weights are nested in the first weights.” Under Step 2A Prong 1, these limitations only serve to describe the abstract idea addressed in the independent claim.
Furthermore under step 2A Prong 2 and 2B, the claim does not recite additional elements to consider other than those considered in the independent claim. Accordingly, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea, nor do they amount to significantly more than the judicial exception because they do not impose any meaningful limits on practicing the abstract idea.

Regarding Claim 4
The claim is directed to a process. The claim recites the following limitations “wherein the bit-width for the first layer of the neural network is determined based on a processing characteristic corresponding to the first layer of the neural network, and wherein the processing characteristic comprises at least one of a required processing speed, a required processing accuracy, a processing difficulty, or a terminal performance.” Under Step 2A Prong 1, these limitations only serve to describe the abstract idea addressed in the independent claim.
Furthermore under step 2A Prong 2 and 2B, the claim does not recite additional elements to consider other than those considered in the independent claim. Accordingly, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea, nor do they amount to significantly more than the judicial exception because they do not impose any meaningful limits on practicing the abstract idea.

Regarding Claim 5
The claim is directed to a process. The claim recites the following limitations “determining a bit-width for a second layer of the neural network; obtaining third weights for a second layer of a source model corresponding to the second layer of the neural network; obtaining fourth weights for the second layer of the neural network by extracting at least one bit corresponding to the determined bit-width for the second layer of the neural network from each of the third weights for the second layer of the source model corresponding to the second layer of the neural network;” Under Step 2A Prong 1, these limitations correspond to an evaluation performed in the human mind for the same reasons described in the rejection of claim 1
Furthermore under step 2A Prong 2 and 2B the claims recite the additional element “and processing input data of the second layer of the neural network by executing the second layer of the neural network based on the obtained fourth weights.” amounts to general linking the abstract idea to a particular technological environment, in the case neural network execution.  Accordingly, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea, nor do they amount to significantly more than the judicial exception because they do not impose any meaningful limits on practicing the abstract idea.

Regarding Claim 6
The claim is directed to a process. The claim recites the following limitations “wherein the third weights have a higher bit-precision than the fourth weights.” Under Step 2A Prong 1, these limitations only serve to describe the abstract idea addressed in the independent claim.
Furthermore under step 2A Prong 2 and 2B, the claim does not recite additional elements to consider other than those considered in the independent claim. Accordingly, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea, nor do they amount to significantly more than the judicial exception because they do not impose any meaningful limits on practicing the abstract idea.

Regarding Claim 7
The claim is directed to a process. The claim recites the following limitations “wherein the fourth weights are nested in the third weights.” Under Step 2A Prong 1, these limitations only serve to describe the abstract idea addressed in the independent claim.
Furthermore under step 2A Prong 2 and 2B, the claim does not recite additional elements to consider other than those considered in the independent claim. Accordingly, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea, nor do they amount to significantly more than the judicial exception because they do not impose any meaningful limits on practicing the abstract idea.

Regarding Claim 8
The claim is directed to a process. The claim does not recite additional abstract ideas beyond those addressed in the independent claim. Under Step 2A Prong 1, these limitations only serve to describe the abstract idea addressed in the independent claim.
Furthermore under step 2A Prong 2 and 2B the claims recite the additional element “wherein the first layer of the neural network executed based on the second weights is configured to process a first task based on the input data of the first layer, and the second layer of the neural network executed based on the fourth weights is configured to process a second task different from the first task based on the input data of the second layer.” amounts to general linking the abstract idea to a particular technological environment, in the case neural network execution.  Accordingly, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea, nor do they amount to significantly more than the judicial exception because they do not impose any meaningful limits on practicing the abstract idea.

Regarding Claim 9
Under step 1, the claim is directed to a method for A neural network processor-implemented method, which is directed to a process, one of the statutory categories. The claim recites the following limitations which are considered abstract ideas “extracting at least one bit, corresponding to a determined bit-width of a first neural network, from each of first weights of a source model;”
Under Step 2A Prong 1, the cited abstract ideas correspond to an evaluation performed in the human mind. Extracting values from existing values is an evaluation preformed in the human mind, for example observing a weight value of 150.5 and extracting a rounded version of the weight value as 100 can be performed my mental evaluation.
Furthermore under step 2A Prong 2 and 2B the claims the additional element “generating results of the first neural network, having second weights based on the extracting, by providing input data to the neural network having the second weights.” amounts to general linking the abstract idea to a particular technological environment, in the case neural network execution.  Accordingly, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea, nor do they amount to significantly more than the judicial exception because they do not impose any meaningful limits on practicing the abstract idea.
Regarding Claim 10
	Claim 10 is rejected for the same reason as claim 2 in connection with claim 9
Regarding Claim 11
	Claim 11 is rejected for the same reason as claim 3 in connection with claim 9
Regarding Claim 12
	Claim 12 is rejected for the same reason as claim 5 in connection with claim 9
Regarding Claim 13
	Claim 13 is rejected for the same reason as claim 6 in connection with claim 9
Regarding Claim 14
	Claim 14 is rejected for the same reason as claim 7 in connection with claim 9
Regarding Claim 15
	Claim 15 is rejected for the same reason as claim 8 in connection with claim 9
Regarding Claim 16
	Claim 15 is rejected for the same reason as claim 8 in connection with claim 9
Regarding Claim 17
Under step 1, the claim is directed to a method for A processor-implemented training method, which is directed to a process, one of the statutory categories. The claim recites the following limitations which are considered abstract ideas “quantizing weights of a high bit-width corresponding to a first layer of a neural network based on weights of a low bit-width corresponding to the first layer of the neural network… updating the weights of the high bit-width based on the determined loss values.”
Under Step 2A Prong 1, the cited abstract ideas correspond to an evaluation performed in the human mind or in part a mathematical computation. Quantizing weight simply amounts to rounding values which can be done in the mind, or a mathematical operation. updating the weights of the high bit-width based on the determined loss values may simply be a decision made in the human mind to increment or decrement all weight values. 
Furthermore under step 2A Prong 2 and 2B the claims the additional element “applying input data to the first layer by determining loss values corresponding to the weights of the low bit-width.” amounts to general linking the abstract idea to a particular technological environment, in the case neural network execution. Accordingly, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea, nor do they amount to significantly more than the judicial exception because they do not impose any meaningful limits on practicing the abstract idea.
Regarding Claim 18
The claim is directed to a process. The claim recites the following limitations “determining weight sets of the low bit-width corresponding to the first layer by quantizing the weights of the high bit-width, after training associated with the weights of the high bit-width is completed.” Under Step 2A Prong 1, these limitations correspond to an evaluation performed in the human mind, therefore recite and additional abstract idea beyond those recited in claim 17. Determining weight sets by quantizing weights is simply an evaluation in the human mind or with aid of pen and paper. Alternatively, determining weight sets by quantizing may be considered a mathematical operation performed on the low bit-width weight set. Examiner notes that “training” is not actively recited by the claims. The claims only indicate the time in which an abstract idea occurs. 
Furthermore under step 2A Prong 2 and 2B, the claim does not recite additional elements to consider other than those considered in the independent claim. Accordingly, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea, nor do they amount to significantly more than the judicial exception because they do not impose any meaningful limits on practicing the abstract idea.
Regarding Claim 19
The claim is directed to a process. The claim recites the following limitations “wherein the weight sets of the low bit-width comprise a weight set of a first bit-width and a weight set of a second bit-width having a lower bit-precision than the weight set of the first bit-width, and wherein the weight set of the second bit-width is nested in the weight set of the first bit-width” Under Step 2A Prong 1, these limitations only serve to describe the abstract idea addressed in the independent claim.
Furthermore under step 2A Prong 2 and 2B, the claim does not recite additional elements to consider other than those considered in the independent claim. Accordingly, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea, nor do they amount to significantly more than the judicial exception because they do not impose any meaningful limits on practicing the abstract idea.
Regarding Claim 20
The claim is directed to a process. The claim recites the following limitations “wherein the weights of the low bit-width include first weights of a first bit-width having a lower bit-precision than the weights of the high bit-width, and second weights of a second bit-width having a lower bit-precision than the first weights of the first bit-width.” Under Step 2A Prong 1, these limitations only serve to describe the abstract idea addressed in the independent claim.
Furthermore under step 2A Prong 2 and 2B, the claim does not recite additional elements to consider other than those considered in the independent claim. Accordingly, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea, nor do they amount to significantly more than the judicial exception because they do not impose any meaningful limits on practicing the abstract idea.
Regarding Claim 21
The claim is directed to a process. The claim recites the following limitations “determining the first weights of the first bit-width by quantizing the weights of the high bit-width; and determining the second weights of the second bit-width by extracting at least one bit from each of the determined first weights of the first bit-width.” Under Step 2A Prong 1, these limitations correspond to an evaluation performed in the human mind for the same reasons described in the rejection of claim 1
Furthermore under step 2A Prong 2 and 2B the claims do not recite any additional elements. Accordingly, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea, nor do they amount to significantly more than the judicial exception because they do not impose any meaningful limits on practicing the abstract idea.
Regarding Claim 22
The claim is directed to a process. The claim recites the following limitations “determining the second weights of the second bit-width by quantizing the weights of the high bit-width; determining the determined second weights of the second bit-width to be an upper bit group of the first weights of the first bit-width; and determining a lower bit group of the first weights of the first bit-width by quantizing the weights of the high bit-width.” Under Step 2A Prong 1, these limitations correspond to an evaluation performed in the human mind for the same reasons described in the rejection of claim 1
Furthermore under step 2A Prong 2 and 2B the claims do not recite any additional elements. Accordingly, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea, nor do they amount to significantly more than the judicial exception because they do not impose any meaningful limits on practicing the abstract idea.
Regarding Claim 23
The claim is directed to a process. The claim recites the following limitations “wherein the updating of the weights of the high bit-width comprises: updating the weights of the high bit-width based on statistical information of loss gradients corresponding to the determined loss values.” Under Step 2A Prong 1, these limitations only serve to describe the abstract idea addressed in the independent claim.
Furthermore under step 2A Prong 2 and 2B, the claim does not recite additional elements to consider other than those considered in the independent claim. Accordingly, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea, nor do they amount to significantly more than the judicial exception because they do not impose any meaningful limits on practicing the abstract idea.
Regarding Claim 24
The claim is directed to a process. The claim recites the following limitations “wherein the updating of the weights of the high bit-width further comprises: calculating the statistical information by assigning a high weighted value to a loss gradient corresponding to a weight for which a high priority is set among the weights of the low bit-width.” Under Step 2A Prong 1, these limitations only serve to describe the abstract idea addressed in the independent claim.
Furthermore under step 2A Prong 2 and 2B, the claim does not recite additional elements to consider other than those considered in the independent claim. Accordingly, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea, nor do they amount to significantly more than the judicial exception because they do not impose any meaningful limits on practicing the abstract idea.
Regarding Claim 25
Under step 1, the claim is directed to a method for A neural network apparatus, which is directed to a machine, one of the statutory categories. The claim recites the following limitations which are considered abstract ideas “extract at least one bits corresponding to a determined bit-width, from each of first weights of a first layer of a source model corresponding to a first layer of a neural network”
Under Step 2A Prong 1, the cited abstract ideas correspond to an evaluation performed in the human mind. Extracting values from existing values is an evaluation preformed in the human mind, for example observing a weight value of 150.5 and extracting a rounded version of the weight value as 100 can be performed my mental evaluation.
Furthermore under step 2A Prong 2 and 2B the claims the additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. (“a processor; and a memory configured to store an instruction readable by the processor, wherein, when the instruction is executed by the processor, the processor is configured to”) See MPEP 2106.05(f). “and generate results of the first layer of the neural network, having second weights based on the extracting, by providing input data to the first layer of the neural network having the second weights.” amounts to general linking the abstract idea to a particular technological environment, in the case neural network execution.  Accordingly, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea, nor do they amount to significantly more than the judicial exception because they do not impose any meaningful limits on practicing the abstract idea.
Regarding Claim 26
	Claim 26 is rejected for the same reason as claim 5 in connection with claim 25
Regarding Claim 27
Under step 1, the claim is directed to a method for A neural network processing apparatus, which is directed to a machine, one of the statutory categories. The claim recites the following limitations which are considered abstract ideas “extract at least one bits corresponding to a determined bit-width of a first neural network, from each of first weights of a source model”
Under Step 2A Prong 1, the cited abstract ideas correspond to an evaluation performed in the human mind. Extracting values from existing values is an evaluation preformed in the human mind, for example observing a weight value of 150.5 and extracting a rounded version of the weight value as 100 can be performed my mental evaluation.
Furthermore under step 2A Prong 2 and 2B the claims recite the additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. (“a processor; and a memory configured to store an instruction readable by the processor, wherein, when the instruction is executed by the processor, the processor is configured to”) See MPEP 2106.05(f). Additionally, the additional element “generate results of the first neural network, having second weights based on the extracting, by providing input data to the neural network having the second weights.” amounts to general linking the abstract idea to a particular technological environment, in the case neural network execution.  Accordingly, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea, nor do they amount to significantly more than the judicial exception because they do not impose any meaningful limits on practicing the abstract idea.
Regarding Claim 28
	Claim 28 is rejected for the same reason as claim 5 in connection with claim 27
Regarding Claim 29
Under step 1, the claim is directed to a method for A neural network training apparatus, which is directed to a machine, one of the statutory categories. The claim recites the following limitations which are considered abstract ideas “quantize weights of a high bit-width corresponding to a first layer of the neural network based on weights of a low bit-width corresponding to the first layer of the neural network.”
Under Step 2A Prong 1, the cited abstract ideas correspond to an evaluation performed in the human mind or in part a mathematical computation. Quantizing weight values simply amounts to rounding values which can be done in the mind. Alternatively quantizing may be considered a mathematical computation.
Furthermore under step 2A Prong 2 and 2B the claims recite the additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. (“a processor; and a memory configured to store an instruction readable by the processor, wherein, when the instruction is executed by the processor, the processor is configured to”) See MPEP 2106.05(f).  The additional element “apply input data to the first layer by determining loss values corresponding to the weights of the low bit-width; and update the weights of the high bit-width based on the determined loss values.” amounts to general linking the abstract idea to a particular technological environment, in this case neural network execution. Accordingly, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea, nor do they amount to significantly more than the judicial exception because they do not impose any meaningful limits on practicing the abstract idea.
Regarding Claim 30
	Claim 30 is rejected for the same reason as claim 20 in connection with claim 29
Regarding Claim 31
	Claim 31 is rejected for the same reason as claim 21 in connection with claim 29
Regarding Claim 32
	Claim 32 is rejected for the same reason as claim 22 in connection with claim 29
Regarding Claim 33
Under step 1, the claim is directed to a method for A processor-implemented method, which is directed to a process, one of the statutory categories. The claim recites the following limitations which are considered abstract ideas “executing a first neural network based on first weights that are trained to process a first task based on the received multilevel input data; executing a second neural network based on second weights that are trained to process a second task based on the received multi level input data”
Under Step 2A Prong 1, the cited abstract ideas correspond to an mathematical computation. Examiner notes that simply “executing” a neural network that has been trained amounts to perform a series of mathematical matrix operations on a generic computer. The fact that the claims passively recite that the neural network has been trained does not change this determination.
Furthermore under step 2A Prong 2 the claims recite the additional element “receiving multilevel input data; outputting the received multilevel input data based on the processed first task and the processed second task.” amounts to an insignificant extra solution activity, (see 2106.05(g)).  Accordingly, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Further, the additional elements of “receiving multilevel input data; outputting multilevel input data based on the processed first task and the processed second task.” are insignificant extra-solution activities that are considered well-understood, routine, conventional activities. Examiner notes that receiving system information amounts to receiving or transmitting data over a network (MPEP 2106.05(d)(II)(i). According to MPEP 2106.05(d)(II)(i), “The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner”. As such, the insignificant extra-solution activities are considered well-understood, routine, conventional activities. Therefore, the claim is not patent eligible.
Regarding Claim 34
The claim is directed to a process. The claim recites the following limitations “wherein the first weights are configured to have a first bit-width and the second weights are configured to have a second bit-width different from the first bit-width.” Under Step 2A Prong 1, these limitations only serve to describe the abstract idea addressed in the independent claim.
Furthermore under step 2A Prong 2 and 2B, the claim does not recite additional elements to consider other than those considered in the independent claim. Accordingly, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea, nor do they amount to significantly more than the judicial exception because they do not impose any meaningful limits on practicing the abstract idea.
Regarding Claim 35
The claim is directed to a process. The claim recites the following limitations “wherein the received multilevel data is one or more of multilevel image data and multilevel voice data.” Under Step 2A Prong 1, these limitations only serve to describe the abstract idea addressed in the independent claim.
Furthermore under step 2A Prong 2 and 2B, the claim does not recite additional elements to consider other than those considered in the independent claim. Accordingly, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea, nor do they amount to significantly more than the judicial exception because they do not impose any meaningful limits on practicing the abstract idea.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim 21 and 22 is/are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Both claim 21 and 22 recite “wherein the determining of the weight of the low bit width comprises”, however claim independent claim 17 does not “determine the weight of the low width”. Rather in amended claim 17 the quantizing of weight of a high bit-width is based on weights of a low bit width.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-16, 25-28 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Narang et al. “mixed precision training” hereinafter Narang.

Regarding claim 1
Narang teaches, A neural network processor-implemented method, comprising:  (Section 3.1 ¶01 “In each iteration an FP16 copy of the master weights is used in the forward and backward pass, halving the storage and bandwidth needed by FP32 training. Figure 1 illustrates this mixed precision training process.” Section 3.1 ¶02 “While the need for FP32 master weights is not universal, there are two possible reasons why a number of networks require it. One explanation is that updates (weight gradients multiplied by the learning rate) become too small to be represented in FP16 - any value whose magnitude is smaller than 2^−24 becomes zero in FP16” Figure 1 demonstrates training for a single layer. Master-weights, first weights, are determined to be necessary as explained in ¶02.) extracting at least one bit, corresponding to determined bit-width, from each of first weights a first layer of a source model corresponding to a first layer of a neural network; and generating results of the first layer of the neural network, having second weights based on the extracting, by providing input data to the first layer of the neural network having the second weights.   (Figure 1 
    PNG
    media_image1.png
    254
    482
    media_image1.png
    Greyscale
as shown in the figure master weights are converted into half precision weights having 16 bits. This corresponds to extracting 16 bits from the first weights to form the second weights. Figure 1 demonstrates executing the neural network using the lower precision second weights.)
Regarding claim 2
	Narang teaches claim 1
Further Narang teaches,  wherein the first weights are configured to have a higher bit-precision than the second weights. ( Figure 1, the master weights have a 32 bit precision thus higher precision than the second weights of 16 bits.)
Regarding claim 3
	Narang teaches claim 1
Further Narang teaches, wherein the second weights are nested in the first weights. (abstract “Firstly, we recommend maintaining a single-precision copy of weights that accumulates the gradients after each optimizer step (this copy is rounded to half-precision for the forward- and back-propagation).” In figure 1 the “float2half” cell demonstrates the operation rounding the single precision weights to half precision, thus the second weights are derived from or nested in the  first weights.)
Regarding claim 4
	Narang teaches claim 1
Further Narang teaches, wherein the bit-width for the first layer of the neural network is determined based on a processing characteristic corresponding to the first layer of the neural network, and wherein the processing characteristic comprises at least one of a required processing speed, a required processing accuracy, a processing difficulty, or a terminal performance. (Section 3.1 “While the need for FP32 master weights is not universal, there are two possible reasons why a number of networks require it. One explanation is that updates… become too small to be represented in FP16… These small valued gradients would become zero in the optimizer when multiplied with the learning rate and adversely affect the model accuracy. Using a single-precision copy for the updates allows us to overcome this problem and recover the accuracy.” Weights are chosen to be 32 bit precision because less precision such as 16 bits would adversely affect the accuracy. The determination to maintain the precision at 32 bits is according to the accuracy, which is a processing characteristic.)
Regarding claim 5
	Narang teaches claim 1
Further Narang teaches, determining a bit-width for a second layer of the neural network; obtaining third weights for a second layer of a source model corresponding to the second layer of the neural network; obtaining fourth weights for the second layer of the neural network by extracting at least one bit corresponding to the determined bit-width for the second layer of the neural network from each of the third weights for the second layer of the source model corresponding to the second layer of the neural network; and processing input data of the second layer of the neural network by executing the second layer of the neural network based on the obtained fourth weights. (Figure 1 
    PNG
    media_image1.png
    254
    482
    media_image1.png
    Greyscale
As described in the rejection of claim 1, a bit width is determined as the master weights for each layer, corresponding to a bit width for a second layer of the neural network. The master weights are the third weights, the forth weights are extracted by the float2half bit reduction block. These forth weights are used for a forward backward pass in order to provide weight updates for the mater weights.)
Regarding claim 6
	Narang teaches claim 5
Further Narang teaches,  wherein the third weights have a higher bit-precision than the fourth weights. ( Figure 1, the master weights have a 32 bit precision thus higher precision than the second weights of 16 bits.)
Regarding claim 7
	Narang teaches claim 5
Further Narang teaches,  wherein the fourth weights are nested in the third weights. (abstract “Firstly, we recommend maintaining a single-precision copy of weights that accumulates the gradients after each optimizer step (this copy is rounded to half-precision for the forward- and back-propagation).” In figure 1 the “float2half” cell demonstrates the operation rounding the single precision weights to half precision, thus the second weights are derived from or nested in the  first weights.)
Regarding claim 8
	Narang teaches claim 5
Further Narang teaches,  wherein the first layer of the neural network executed based on the second weights is configured to process a first task based on the input data of the first layer, and the second layer of the neural network executed based on the fourth weights is configured to process a second task different from the first task based on the input data of the second layer. (Section 4 ¶01 “Mixed Precision experiments were conducted on Volta V100 that accumulates FP16 products into FP32. The mixed precision speech recognition experiments (Section 4.3)” Section 4.3 ¶01 “We explore mixed precision training for speech data using the DeepSpeech 2 model for both English and Mandarin datasets. The model used for training on the English dataset consists of two 2D convolution layers, three recurrent layers with GRU cells, 1 row convolution layer and Connectionist temporal classification (CTC) cost layer” an English dataset is used as input data by a multi layer neural network. The first layer having second weights performs a convolution task, while a second layer having forth weights performs a recurrent layer task. The output of these layers is based on the input data.)
Regarding claim 9
Narang teaches, A neural network processor-implemented method, comprising (Section 3.1 ¶01 “In each iteration an FP16 copy of the master weights is used in the forward and backward pass, halving the storage and bandwidth needed by FP32 training. Figure 1 illustrates this mixed precision training process.” Section 3.1 ¶02 “While the need for FP32 master weights is not universal, there are two possible reasons why a number of networks require it. One explanation is that updates (weight gradients multiplied by the learning rate) become too small to be represented in FP16 - any value whose magnitude is smaller than 2^−24 becomes zero in FP16” Figure 1 demonstrates training for a single layer. Master-weights, first weights, are determined to be necessary as explained in ¶02.) extracting at least one bit, corresponding to a determined bit-width of a first neural network, from each of  (Figure 1 
    PNG
    media_image1.png
    254
    482
    media_image1.png
    Greyscale
as shown in the figure master weights are converted into half precision weights having 16 bits. This corresponds to extracting 16 bits from the first weights to form the second weights. Figure 1 demonstrates executing the neural network using the lower precision second weights.)
Regarding claim 10
	Claim 10 is rejected for the reasons set forth in claim 2 in connection with claim 9
Regarding claim 11
	Claim 11 is rejected for the reasons set forth in claim 3 in connection with claim 9
Regarding claim 12
	Claim 12 is rejected for the reasons set forth in claim 5 in connection with claim 9
Regarding claim 13
	Claim 13 is rejected for the reasons set forth in claim 6 in connection with claim 9
Regarding claim 14
	Claim 14 is rejected for the reasons set forth in claim 7 in connection with claim 9
Regarding claim 15
	Claim 15 is rejected for the reasons set forth in claim 8 in connection with claim 9
Regarding claim 16
	Claim 16 is rejected for the reasons set forth in claim 4 in connection with claim 9
Regarding claim 25
Narang teaches,  A neural network apparatus, comprising: a processor; and a memory configured to store an instruction readable by the processor, wherein, when the instruction is executed by the processor, the processor is configured to: (Section 4 ¶01-¶02 “We have run experiments for a variety of deep learning tasks covering a wide range of deep learning Models…The Baseline experiments were conducted on NVIDIA’s Maxwell or Pascal GPU”) extract at least one bits corresponding to a determined bit-width, from each of first weights of a first layer of a source model corresponding to a first layer of a neural network; and generate results of the first layer of the neural network, having second weights based on the extracting, by providing input data to the first layer of the neural network having the second weights. (Figure 1 and Section 3.1 ¶01 “In each iteration an FP16 copy of the master weights is used in the forward and backward pass, halving the storage and bandwidth needed by FP32 training. Figure 1 illustrates this mixed precision training process.” Section 3.1 ¶02 “While the need for FP32 master weights is not universal, there are two possible reasons why a number of networks require it. One explanation is that updates (weight gradients multiplied by the learning rate) become too small to be represented in FP16 - any value whose magnitude is smaller than 2^−24 becomes zero in FP16” 
    PNG
    media_image1.png
    254
    482
    media_image1.png
    Greyscale
as shown in the figure master weights are converted into half precision weights having 16 bits. This corresponds to extracting 16 bits from the first weights to form the second weights. Figure 1 demonstrates executing the neural network using the lower precision second weights.)
Regarding claim 26
	Claim 26 is rejected for the reasons set forth in claim 5 in connection with claim 25
Regarding claim 27
Narang teaches,  A neural network processing apparatus, comprising: a processor; and 7a memory configured to store an instruction readable by the processor, wherein, when the instruction is executed by the processor, the processor is configured to: (Section 4 ¶01-¶02 “We have run experiments for a variety of deep learning tasks covering a wide range of deep learning Models…The Baseline experiments were conducted on NVIDIA’s Maxwell or Pascal GPU”) extract at least one bits corresponding to a determined bit-width of a first neural network, from each of first weights of a source model; and generate results of the first neural network, having second weights based on the extracting, by providing input data to the neural network having the second weights (Figure 1 and Section 3.1 ¶01 “In each iteration an FP16 copy of the master weights is used in the forward and backward pass, halving the storage and bandwidth needed by FP32 training. Figure 1 illustrates this mixed precision training process.” Section 3.1 ¶02 “While the need for FP32 master weights is not universal, there are two possible reasons why a number of networks require it. One explanation is that updates (weight gradients multiplied by the learning rate) become too small to be represented in FP16 - any value whose magnitude is smaller than 2^−24 becomes zero in FP16”  
    PNG
    media_image1.png
    254
    482
    media_image1.png
    Greyscale
as shown in the figure master weights are converted into half precision weights having 16 bits. This corresponds to extracting 16 bits from the first weights to form the second weights. Figure 1 demonstrates executing the neural network using the lower precision second weights.)
Regarding claim 28
	Claim 28 is rejected for the reasons set forth in claim 5 in connection with claim 27

Claim(s) 33 and 35  is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Kim et al. “Multi-modal Emotion Recognition using Semi-supervised Learning and Multiple Neural Networks in the Wild” hereinafter Kim.

Regarding claim 33
	Kim teaches,  A processor-implemented method comprising: (pg 533 Section 7.2 ¶01 “We implemented our network using a tensor-based high-level deep learning library called Keras”)  receiving multilevel input data; executing a first neural network based on first weights that are trained to process a first task based on the received multilevel input data; executing a second neural network based on second weights that are trained to process a second task based on the received multilevel input data; ( Section 2 pg 530 ¶01 “As shown in Fig. 1, the proposed scheme, which is based on multi-modal data, consists of an image-based network, a landmark-based network, and an audio-based network” pg 533 Section 7.1 ¶01 “The dataset AFEW 6.0 [41] provided by the EmotiW 2017 challenge consists of three parts: the training dataset (773 video clips), the validation dataset (383 video clips), and the test dataset (653 video clips).” The multi-modal model described is trained on training data. The first neural network is trained on the task of processing the images, while the second neural network is trained on a different task of processing the audio data.) and outputting the received multilevel input data based on the processed first task and the processed second task.   (pg 532-533 Section 6 ¶01 “We can obtain a prediction score for each of the seven emotions from the seven networks mentioned above. The final step is to determine the final emotion from these scores…. As a result, the final prediction output through the seven networks is determined as shown in Eq. 5” the output of each of neural network processing multi-model or multilevel input data is used by an additional module, the emotion adaptive module, as input, thus the networks output input data or predictions which are based on their respective tasks.)  
Regarding claim 35
	Kim teaches claim 33
	Further Kim teaches, wherein the received multilevel input data is one or more of multilevel image data and multilevel voice data. (abstract pg 529 and Figure 1 “The proposed method has the following features. First, the learning performance of the image based network is greatly improved by employing both multi-task learning and semi-supervised learning using the spatio-temporal characteristic of videos… we propose an audio deep learning mechanism robust to the specific emotions” 
    PNG
    media_image2.png
    454
    930
    media_image2.png
    Greyscale
as shown in figure 1 the multi-level multi task neural network processes multiple type of data including audio and image data.)

Claim Rejections - 35 U.S.C. § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. §§ 102 and 103 (or as subject to pre-AIA  35 U.S.C. §§ 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 17-18, 23 and 29 are rejected under 35 U.S.C. § 103 as being unpatentable over Narang. Further in view of Jia et al. “Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes” hereinafter Jia. 

Regarding claim 17
Narang teaches,  A processor-implemented training method, comprising: (Section 4 ¶01-¶02 “We have run experiments for a variety of deep learning tasks covering a wide range of deep learning Models…The Baseline experiments were conducted on NVIDIA’s Maxwell or Pascal GPU”)  quantizing weights of a high bit-width corresponding to a first layer of a neural network based on weights of a low bit-width corresponding to the first layer of the neural network; (Section 3.1 ¶01 “In each iteration an FP16 copy of the master weights is used in the forward and backward pass, halving the storage and bandwidth needed by FP32 training.” FP16 weights are derived by quantizing or reducing the bit width of higher bit width weights to a lower precision, in particular from 32 bits to 16 bis.) applying input data to the first layer by determining [weight update values] corresponding to the weights of the low bit-width (Figure 1 
    PNG
    media_image1.png
    254
    482
    media_image1.png
    Greyscale
the figure showcases a layer operation, based on the activations in the forward propagation the weight gradient corresponding to the weight gradient of the determined weights of the low bit width.) and updating the weights of the high bit-width based on the determined [weight update values]. (Figure 1 
    PNG
    media_image1.png
    254
    482
    media_image1.png
    Greyscale
finally in the weight update block, the higher bit width master weights are updated.)
Narang does not explicitly teach, determining loss values; updating weights…based on the determined loss values
Jia however when discussing weight updates based on layer loss values teaches, determining loss values; updating weights…based on the determined loss values (Section 3 pg 3 “The algorithm introduces a local learning rate for each layer (as shown in Equation 1), which is the ratio of the L2-norm of weights and gradients weighted by a LARS coefficient η … 
    PNG
    media_image3.png
    49
    207
    media_image3.png
    Greyscale
 To cope with this situation, we have proposed a training strategy which uses mixed-precision training with LARS as shown in Figure 2” the gradients of the loss ∇L(w) correspond to determining loss values. These loss values are used to update the weights of a neural network layer.)
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the neural network weight update method of Narang to use the weight update method described by Jia, which utilizes loss values.  One would have been motivated to make such a combination because both Narang and Jia discuss methods for mixed precision training. Jia notes that utilizing LARS weight updates the authors use “the mixed-precision techniques to improve the throughput of a single GPU without losing accuracy.” (Conclusion pg 8)
Regarding claim 18
	Narang/Jia teaches claim 17
Further Narang teaches,  after training associated with the weights of the high bit-width is completed ( pg 8 ¶01 “Adam optimizer was used to train for 100K iterations.” With reference to figure 1 the process describes a single of many training iterations.) determining weight sets of the low bit-width corresponding to the first layer by quantizing the weights of the high bit-width, ( after a first iteration, corresponding to after training, the previously updated high bit-width weights are quantized in order to determine a plurality of weights, corresponding to weight sets, having lower bit-width.) 
Regarding claim 23
	Narang/Jia teaches claim 17
	Further Jia teaches, updating the weights of the high bit-width based on statistical information of loss gradients corresponding to the determined loss values (Section 4.1 pg 3 “The algorithm introduces a local learning rate for each layer (as shown in Equation 1), which is the ratio of the L2-norm of weights and gradients weighted by a LARS coefficient η… 
    PNG
    media_image4.png
    62
    409
    media_image4.png
    Greyscale
” the weights are updated according to the ration in equation 1, this ratio is statistical information as it represents the ratio of weights to gradients. Grad L(w) corresponds to the determined gradient of loss values for a given layer
Regarding claim 29
Narang teaches, A neural network training apparatus, comprising: a processor; and a memory configured to store an instruction readable by the processor, wherein, when the instruction is executed by the processor, the processor is configured to(Section 4 ¶01-¶02 “We have run experiments for a variety of deep learning tasks covering a wide range of deep learning Models…The Baseline experiments were conducted on NVIDIA’s Maxwell or Pascal GPU”) quantize weights of a high bit-width corresponding to a first layer of the neural network based on weights of a low bit-width corresponding to the first layer of the neural network; (Section 3.1 ¶01 “In each iteration an FP16 copy of the master weights is used in the forward and backward pass, halving the storage and bandwidth needed by FP32 training.” FP16 weights are derived by quantizing or reducing the bit width of higher bit width weights to a lower precision, in particular from 32 bits to 16 bis.)
 apply input data to the first layer by determining [update values] corresponding to the weights of the low bit-width; (Figure 1 
    PNG
    media_image1.png
    254
    482
    media_image1.png
    Greyscale
the figure showcases a layer operation, based on the activations in the forward propagation the weight gradient corresponding to the weight gradient of the determined weights of the low bit width.)
and update the weights of the high bit-width based on the determined [weight update values]. (Figure 1 
    PNG
    media_image1.png
    254
    482
    media_image1.png
    Greyscale
finally in the weight update block, the higher bit width master weights are updated.)
Narang does not explicitly teach, determine loss values; update weights…based on the determined loss values
Jia however when discussing weight updates based on layer loss values teaches, determine loss values; update weights…based on the determined loss values (Section 3 pg 3 “The algorithm introduces a local learning rate for each layer (as shown in Equation 1), which is the ratio of the L2-norm of weights and gradients weighted by a LARS coefficient η … 
    PNG
    media_image3.png
    49
    207
    media_image3.png
    Greyscale
 To cope with this situation, we have proposed a training strategy which uses mixed-precision training with LARS as shown in Figure 2” the gradients of the loss ∇L(w) correspond to determining loss values. These loss values are used to update the weights of a neural network layer.)
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the neural network weight update method of Narang to use the weight update method described by Jia, which utilizes loss values.  One would have been motivated to make such a combination because both Narang and Jia discuss methods for mixed precision training. Jia notes that utilizing LARS weight updates the authors use “the mixed-precision techniques to improve the throughput of a single GPU without losing accuracy.” (Conclusion pg 8)

Claim(s) 19-21 are rejected under 35 U.S.C. § 103 as being unpatentable over Narang/Jia. Further in view of Wen et al. “Training Bit Fully Convolutional Network for Fast Semantic Segmentation” hereinafter Wen. 

Regarding claim 19
	Narang/Jia teaches claim 18
	Narang/Jia does not explicitly teach, wherein the weight sets of the low bit-width comprise a weight set of a first bit-width and a weight set of a second bit-width having a lower bit-precision than the weight set of the first bit-width, and wherein the weight set of the second bit-width is nested in the weight set of the first bit-width. 
	Wen however, when discussing training a bit reduced neural network teaches, wherein the weight sets of the low bit-width comprise a weight set of a first bit-width and a weight set of a second bit-width having a lower bit-precision than the weight set of the first bit-width, (pg 4 right column ¶02 “to train BFCN. We propose a method called bit-width decay, which cuts off bit-width step by-step… We detail the procedure of bit-width decay method as follow: 1. Pretrain a full-precision network N1. 2. Quantize N1 to produce N2 in 8-bit, which has been proved to be lossless, and fine-tune until its convergence. 3. Initialize N3 with N2. 4. Decrease bit-width of N3, and fine-tune for enough iterations. 5. Repeat step 4 until desired bit-width is reached.” Wen details a step by step training of a neural network, each time a new bit width is determined for the neural network a new weight set is determined. First an 8 bit set is determined corresponding to the first bit width, next the bit width is decreased to a smaller bit width in step 4, corresponding to the second bit width weight set”) and wherein the weight set of the second bit-width is nested in the weight set of the first bit-width. (pg 4 right column step 3-4  “Initialize N3 with N2…Decrease bit-width of N3, and fine-tune for enough iterations.” Examiner notes that N3 is initialized based on a neural network with 8 bit weights, N2. Then the bit width of N3 is reduced based on the weights that were previously determined from the weight set of a first bit-width. Therefore the extracting second bit width is nested or encoded in the first weight set.) 
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the quantization scheme disclosed by Narang/Jai with the iterative quantization method discussed in Wen.  One would have been motivated to make such a combination because both Narang/Jia and Wen discuss quantization of neural networks for the purposes of improving model speed while maintain acceptable performance. Wen describes an iterative quantization of neural network parameters in order to train a cnn “to accelerate inference speed and reduce memory footprint. We also propose a novel method to train a low bit-width network, which decreases bit-width step by step to reduce performance loss resulting from quantization”. Wen’s iterative quantization is able to “train efficient low bit width scene-parsing networks without losing much performance” (Conclusion pg 6 Wen)

Regarding claim 20
	Narang/Jia teaches claim 17
	Narang/Jia does not explicitly teach, wherein the weights of the low bit-width include first weights of a first bit-width having a lower bit-precision than the weights of the high bit-width, and second weights of a second bit-width having a lower bit-precision than the first weights of the first bit-width.
	Wen however, when discussing training a bit reduced neural network teaches, wherein the weights of the low bit-width include first weights of a first bit-width having a lower bit-precision than the weights of the high bit-width, and second weights of a second bit-width having a lower bit-precision than the first weights of the first bit-width. (pg 4 right column ¶02 “to train BFCN. We propose a method called bit-width decay, which cuts off bit-width step by-step… We detail the procedure of bit-width decay method as follow: 1. Pretrain a full-precision network N1. 2. Quantize N1 to produce N2 in 8-bit, which has been proved to be lossless, and fine-tune until its convergence. 3. Initialize N3 with N2. 4. Decrease bit-width of N3, and fine-tune for enough iterations. 5. Repeat step 4 until desired bit-width is reached.” As pointed out in the rejection of claim 19, iterative quantization of weight sets corresponds to determining a first weights of a first bit width and second weights of a second bit width. On each iteration a bit width is reduced thus the newly determined weight sets are of a lower precision or bitwidth.)
	For the reasons to combine Narang/Jia see the reasons set forth in the rejection of claim 19
Regarding claim 21
	Narang/Jia/Wen teaches claim 20
	Further Wen teaches, determining the first weights of the first bit-width by quantizing the weights of the high bit-width; (pg 4 step 1 and step 2 “1. Pretrain a full-precision network N1. 2. Quantize N1 to produce N2 in 8-bit, which has been proved to be lossless, and fine-tune until its convergence.” The full precision network corresponds to high bit width weights, PHOSITA would understand that full precision network can refer to either 32 or 64 bit precision. This is also evident in the comparison demonstrated in Figure 3.) and determining the second weights of the second bit-width by extracting at least one bit from each of the determined first weights of the first bit-width (pg 4 right column ¶02 “to train BFCN. We propose a method called bit-width decay, which cuts off bit-width step by-step… 2. Quantize N1 to produce N2 in 8-bit, which has been proved to be lossless, and fine-tune until its convergence. 3. Initialize N3 with N2. 4. Decrease bit-width of N3, and fine-tune for enough iterations. 5. Repeat step 4 until desired bit-width is reached.” Decreasing the bit width of N3 corresponds to extracting at least one bit from the initialized low precision first weights of the first bit width.)
Regarding claim 30
	Claim 30 is rejected for the reasons set forth in claim 20 in connection with claim 29
Regarding claim 31
	Claim 31 is rejected for the reasons set forth in claim 21 in connection with claim 30
	
Claim(s) 22 and 32 are rejected under 35 U.S.C. § 103 as being unpatentable over Narang/Jia/Wen. Further in view of Sachs et al. “Round Get Around: Why Fixed-Point RightShifts Are Just Fine” hereinafter Sachs. 

Regarding claim 22
	Narang/Jia/Wen teaches claim 20
	Further Wen teaches, determining the second weights of the second bit-width by quantizing the weights of the high bit-width; determining a lower bit group of the first weights of the first bit-width by quantizing the weights of the high bit-width. determining the determined second [values]…[based on the first values] of the first bit-width; (pg 4 right column ¶02 “to train BFCN. We propose a method called bit-width decay, which cuts off bit-width step by-step… We detail the procedure of bit-width decay method as follow: 1. Pretrain a full-precision network N1. 2. Quantize N1 to produce N2 in 8-bit, which has been proved to be lossless, and fine-tune until its convergence. 3. Initialize N3 with N2. 4. Decrease bit-width of N3, and fine-tune for enough iterations. 5. Repeat step 4 until desired bit-width is reached.” First high bit width weights are quantized, based on the quantization in the first step a second weights and first weights are determined, thus corresponding to “by quantizing the weights of the high bit width”. Because the Full precision weights have more bits the determined 8 bit weights belong to a lower bit group.)
	Narang/Jia/Wen does not explicitly teach, determining the determined second [values] of the second bit-width to be an upper bit group of the first [values] of the first bit-width; 
	However Sachs when addressing rounding of values using truncation teaches, determining the determined second [values] of the second bit-width to be an upper bit group of the first [values] of the first bit-width; (pg 2 last paragraph “Numerical truncation to an integer is throwing away the digits to the right of the decimal point, effectively rounding towards zero. Bitwise truncation is throwing away all the bits to the right of the binary point, which with two’s complement representation always rounds downward. If we had the number -3.34375, this could be represented exactly in two’s complement as 11111100.10101000 which truncates to 11111100 = -4” numerical truncation is a type of quantization in which a value with a higher bit width, corresponding to the first values of the first bit width, is truncated by extracting only the upper bit values.)
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the quantization scheme disclosed by Narang/Jai/Wen with the truncation based quantization method discussed in Sachs.  One would have been motivated to make such a combination because both Narang/Jia/Wen and Sachs discuss quantization for the purposes of improving computation speed while maintain acceptable performance. Sachs notes that truncation is the preferred rounding/quantization scheme when speed is a concern noting that “Runtime calculations, on the other hand, have a slight increase in cost for rounding compared to bitwise truncation… This costs an extra operation, which can add up on an embedded system, and some specialized DSP instructions on certain architectures”. Sachs notes the extra accuracy afforded by using “rounding-to-nearest-integer is not necessary. (Sach pg 3-4)
Regarding claim 32
	Claim 32 is rejected for the reasons set forth in claim 22 in connection with claim 30

Claim(s) 34 are rejected under 35 U.S.C. § 103 as being unpatentable over Kim. Further in view of Lee et al. “UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision” hereinafter Lee. 

Regarding claim 34
	Kim teaches claim 33
	Kim does not explicitly teach, wherein the first weights are configured to have a first bit-width and the second weights are configured to have a second bit-width different from the first bit-width
	Lee however when addressing neural networks with variable precision teaches, wherein the first weights are configured to have a first bit-width and the second weights are configured to have a second bit-width different from the first bit-width. (pg 174 right column ¶02 “unified DNN core architecture and 2) fully variable weight bit precision. We present an UNPU supporting CLs, RLs, and FCLs with fully variable weight bit precision from 1 to 16 bit. As shown in Fig. 3” Lee presents an optimized neural network which utilizes variable precision across layers. Therefore two neural networks that implement fully variable precision will have weight sets which have different bit precision from each other.)
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the neural network of Kim to employ variable precision weights as discussed in Lee.  One would have been motivated to make such a combination because by employing variable bit precision. Lee notes “UNPU supports fully variable weight bit precision from 1 to 16 bit to accelerate DNNs on the energy-accuracy optimal point, and it has unified DNN core architecture which makes the UNPU get 1.15× higher computation performance” (Conclusion Lee)

Allowable Subject Matter
Claims 24 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.	
Specifically, none of the reference of record either alone or in combination fairly disclose or suggest the limitations of claim 24.
The closest prior art of record Lin et al (“Towards Accurate Binary Convolutional Neural Network”) which teaches a neural network system whose gradient updates are based on a weighted sum of sub networks. However, the weights of the weighted sum are determined through training and not assigned according to a “high priority is set among the weights of the low bit width” as claimed.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNATHAN R GERMICK whose telephone number is (571)272-8363. The examiner can normally be reached M-F 7:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/J.R.G./
Examiner, Art Unit 2122  
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122