Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on April 27, 2022, in which claim 5 is amended. Claims 1-20 are currently pending.

Response to Arguments
The rejections to claims 5-6 under 35 U.S.C. § 112(b) are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
Applicant’s arguments with respect to rejection of claims 1-20 under 35 U.S.C. 101 based on amendment have been considered, however, have not been deemed persuasive.
With respect to Applicant’s arguments that none of the claimed limitations can be performed in the mind, Examiner respectfully disagrees.  A training loss is routinely expressed as a mathematical calculation, however, one could reasonably determine the results of a mathematical calculation through observation, evaluation, and judgement without performing the calculation at all.  Because of the explicit claim language “determining a loss” Examiner asserts that the claim limitation is directed solely to a mental process.  For example, if a loss were to be displayed through a computer monitor to a user, it would be obvious to one of ordinary skill in the art that said user could determine the loss without performing any mathematical calculations of their own.  Said loss could be transmitted and received synonymously to merely gathering and outputting data, without necessarily relying on processing or training the neural network directly.  The limitation does not rely on any technique or technology that would make it impractical to perform the determining in the mind.  Similarly, grouping and assigning samples is routinely performed in the mind, and there is no technique or technology mentioned in the claim language that would make it impractical to perform said grouping and assigning in the mind.  Processing the subsets in the neural network as mentioned is seen as generally linking the judicial exception to a particular field or technology.  As described in the office action, the application of a processor and memory as described in claims 10 and 20 are seen as generic computer components which do not integrate the judicial exception into a practical application.  For these reasons, the rejection is maintained. 

Applicant’s arguments with respect to rejection of claims 1-20 under 35 U.S.C. 102/103 based on amendment have been considered, however, have not been deemed persuasive.
With respect to Applicant's arguments that Burger does not fairly teach or suggest grouping based on loss, Examiner respectfully disagrees.  Examiner notes that grouping ‘based’ on losses is a very broad limitation open to broad interpretation.  It is well known to one of ordinary skill in the art that quantization in a neural network produces quantization loss which is propagated throughout the network in training.  In fact, quantization loss is a primary motivation of the invention of Burger as explicitly stated in the abstract ([Abstract] "a normal precision model or a quantized precision model can be retrained by evaluating loss induced by performing operations in the quantized format.").  Burger provides methods of overcoming said quantization loss by sharing the mantissa of a floating point number as to reduce the memory footprint.  In regards to the selection of said values Burger explicitly teaches that loss is a primary consideration ([¶0068] "There are several possible choices for which values in a block floating-point tensor will share an exponent. The simplest choice is for an entire matrix or vector to share an exponent. However, sharing an exponent over a finer granularity can reduce errors because it increases the likelihood of BFP numbers using a shared exponent that is closer to their original normal floating-point format exponent. Thus, loss of precision due to dropping mantissa bits (when shifting the mantissa to correspond to a shared exponent) can be reduced.").  In the same breath [¶0069-0070] Burger teaches the grouping of numbers which share a common exponent which has been explicitly decided based on loss.  The citations in the office action at paragraphs 0083-0085 citing grouping by accuracy further support the described interpretation as one of ordinary skill in the art would recognize that loss and accuracy are highly correlated in neural network training.  For these reasons, Examiner asserts that the interpretation that Burger explicitly teaches the claim limitation of grouping the samples into subsets based on the losses is very reasonable. 
With respect to Applicant's arguments that Burger does not fairly teach or suggest assigning subsets to operands, Examiner respectfully disagrees.  Burger explicitly teaches performing operations in a dynamic fixed point format ([¶0120] "performing at least one operation with the set of quantized-precision format number"), which one of ordinary skill in the art would recognize as being synonymous with assigning to an operand of said fixed point format.  Burger further teaches that the numbers assigned to said operand are grouped in a shared tensor ([¶0120] "converting an input tensor of normal-precision floating-point numbers to a set of numbers represented in a quantized-precision format").  For these reasons, Examiner asserts that the interpretation that Burger explicitly teaches assigning the subsets to operands in different precisions is very reasonable.  Therefore, the rejections are maintained.

Claim Rejections - 35 USC § 101
101 Rejection
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 USC § 101 because the claimed invention is directed to non-statutory subject matter.

Regarding Claim 1:  Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 1 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: 
determining losses of samples within an input volume that is provided to a neural network during a first epoch (observation, evaluation, and judgement),
 grouping the samples into subsets based on the losses (observation, evaluation, and judgement)
assigning the subsets to operands in the neural network that represent the samples at different precisions so that each subset is associated with a different precision (observation, evaluation, and judgement)
Therefore, claim 1 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 1 recites additional elements “processing the subsets in the neural network at the different precisions during the first epoch”. However, these additional features are seen as merely generally linking the judicial exception to a particular field or technology.  Specifically, they are generic computer functions performed on generic computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  
Step 2B Analysis:  Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 1 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to independent claims 10 and 20, which recite an apparatus including additional generic computer components “processor” and “memory” which do not integrate the judicial exception into a practical application.  The rejection applies as well as to dependent claims 2-9 and 11-19. The additional limitations of the dependent claims are addressed briefly below:
Dependent claims 2 and 11 recite additional observation, evaluation, and judgement “assigning subsets having higher losses to operands having higher precisions.”
Dependent claims 3 and 12 recite additional observation, evaluation, and judgement “determining sets of model parameters.”
Dependent claims 4 and 13 recite additional computer components recited at a high level of generality: “connection weights”, “gradients”, and “activations of neurons”, as well as additional insignificant extra-solution activity “wherein the sets of model parameters comprise at least one of connection weights for connections between nodes in the neural network, activations of neurons in the neural network, and gradients for steepest descent estimations” which amounts to selection of a data type.
Dependent claims 5 and 15 are seen as a further attempt to generally link the judicial exception to a particular field or technology by the use of generic computer functions cited at a high level of generality “modifying a number of the subsets during a second epoch that is subsequent to the first epoch” and “processing the modified number of the subsets in the neural network at the number of different precisions during the first epoch” (See DDR Holdings, LLC v. Hotels.com, LP, 773 F.3d 1245, 1258-59, 113 USPQ2d 1097, 1106-07 (Fed. Cir. 2014), and buySAFE Inc. v. Google, Inc., 765 F.3d 1350, 1354, 112 USPQ2d 1093, 1095-96 (Fed. Cir. 2014))
Dependent claims 6 and 16 are seen as a further attempt to generally link the judicial exception to a particular field or technology by the use of generic computer functions cited at a high level of generality “Dependent claims 5 and 15 are seen as a further attempt to generally link the judicial exception to a particular field or technology by the use of generic computer functions cited at a high level of generality” (See DDR Holdings, LLC v. Hotels.com, LP, 773 F.3d 1245, 1258-59, 113 USPQ2d 1097, 1106-07 (Fed. Cir. 2014), and buySAFE Inc. v. Google, Inc., 765 F.3d 1350, 1354, 112 USPQ2d 1093, 1095-96 (Fed. Cir. 2014))
Dependent claims 7 and 17 are seen as a further attempt to generally link the judicial exception to a particular field or technology by the use of generic computer functions cited at a high level of generality “modifying at least one of the different precisions during a second epoch that is subsequent to the first epoch” (See DDR Holdings, LLC v. Hotels.com, LP, 773 F.3d 1245, 1258-59, 113 USPQ2d 1097, 1106-07 (Fed. Cir. 2014), and buySAFE Inc. v. Google, Inc., 765 F.3d 1350, 1354, 112 USPQ2d 1093, 1095-96 (Fed. Cir. 2014))
Dependent claims 8 and 18 are seen as a further attempt to generally link the judicial exception to a particular field or technology by the use of generic computer functions cited at a high level of generality “modifying the different precisions comprises decreasing the at least one of the different precisions during the second epoch, relative to the different precisions used during the first epoch” (See DDR Holdings, LLC v. Hotels.com, LP, 773 F.3d 1245, 1258-59, 113 USPQ2d 1097, 1106-07 (Fed. Cir. 2014), and buySAFE Inc. v. Google, Inc., 765 F.3d 1350, 1354, 112 USPQ2d 1093, 1095-96 (Fed. Cir. 2014))
Dependent claims 9 and 19 recite additional observation, evaluation, and judgement “determining a validation error based on a validation set in response to completing the first epoch” as well as additional generic computer functions recited at a high level of generality to generally link the judicial exception to a particular field or technology “setting the different precisions to a maximum precision for the subsets during a second epoch that is subsequent to the first epoch in response to the validation error increasing relative to a previously determined validation error”
Dependent claim 14 recites additional insignificant extra-solution activity “wherein the memory is configured to store the sets of model parameters at the different precisions associated with the subsets of the samples” which amounts to gathering data (See Mayo, 566 U.S. at 79, 101 USPQ2d at 1968; OIP Techs., Inc. v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1092-93 (Fed. Cir. 2015))

Therefore, when considering the elements separately and in combination, they do not do not add significantly more to the inventive concept. Accordingly, claims 1-20 are rejected under 35 U.S.C. § 101.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim 1, 3-5, 10, 12-15, and 20 are rejected under 35 U.S.C. 102 as being unpatentable over Burger (US 2019/0340499 A1). 

	Regarding claim 1, Burger teaches A method comprising: determining losses of samples within an input volume that is provided to a neural network during a first epoch; ([¶0078] "the learning rate of the network may be reduced when performing fine-tuning as part of the quantized training 650. After at least one round of fine-tuning to produce an updated quantized model 630, an updated quantized model W** 660 is produced. This updated quantized model 660 can be used to evaluate how the trained model will perform" [¶0080] "As shown, a quantized floating-point model 630 is provided for quantized training. Loss is computed at 710 for forward-pass training using the quantized model 630. A gradient of the error is computed with respect to the quantized model" First epoch interpreted as synonymous with first round of training.)
	grouping the samples into subsets based on the losses; ([¶0083] "In some examples, a shared exponent for a tensor or a split tensor can be selected in a number of different ways." [¶0085] "For example, an exponent can be selected such that accuracy of the representation is improved overall" Grouping by accuracy is interpreted as synonymous with grouping based on the losses.)
	assigning the subsets to operands in the neural network that represent the samples at different precisions so that each subset is associated with a different precision; and ([¶0075] "As shown in FIG. 5, a neural network represented as a set of pre-trained model values W 510 is provided...After the neural network has been trained, at least a portion of the neural network trained model W* 520 (comprising one or more input tensor) is converted from its normal-precision floating-point format into a set of quantized precision format numbers." W is taught as a set of fixed precision numbers and W* is a set of block floating point quantized precision numbers.)
	processing the subsets in the neural network at the different precisions during the first epoch. ([¶0077] "As shown, an input neural network model W 610 is represented as a set of pre-trained model parameters, which has been trained using a normal floating-point representation, to produce a trained neural network model W*, in a similar manner to that discussed above regarding FIG. 5. This trained model W* 620 is quantized to produce a quantized model Wq 630" producing a quantized model is interpreted as synonymous with processing the subsets at the different precisions.). 

	Regarding claim 3, Burger teaches The method of claim 1, wherein processing the subsets comprises determining sets of model parameters for the subsets during at least one of a forward pass and a backward pass through the neural network during the first epoch, wherein each set of model parameters for the subsets is represented at a different corresponding one of the different precisions. ([¶0050] "For example, a portion of values representing the neural network can be received, including edge weights, activation values, or other suitable parameters for quantization." [¶0080] "During back propagation, a straight-through estimator is used for the gradients of the quantization operators. In other words, back-propagation is performed using identity operators instead of derivatives of the forward pass operation." [¶0081] "For example, instead of using a single shared exponent for an entire array, multiple shared exponents may be used on a per-row, a per-column, or a per-tile basis. In some examples, it may be determined that certain portions of the neural network should be implemented with normal-precision floating-point using a general-purpose CPU in the final model, and only other portions of the neural network model are quantized and implemented using a hardware accelerator."). 

	Regarding claim 4, Burger teaches The method of claim 3, wherein the sets of model parameters comprise at least one of connection weights for connections between nodes in the neural network, activations of neurons in the neural network, and gradients for steepest descent estimations. ([¶0050] "For example, a portion of values representing the neural network can be received, including edge weights, activation values, or other suitable parameters for quantization." Connection weights interpreted as synonymous with edge weights.). 

	Regarding claim 5, Burger teaches The method of claim 1, further comprising: modifying a number of the subsets during a second epoch that is subsequent to the first epoch; and ([¶0078] "parameters used for the fine-tuning training are varied from that used for the floating-point training performed to produce the trained neural network W* 620...the learning rate of the network may be reduced when performing fine-tuning as part of the quantized training 650. After at least one round of fine-tuning to produce an updated quantized model" Round interpreted as synonymous with epoch such that during a second epoch is interpreted as synonymous with after one round.)
	processing the modified number of the subsets in the neural network at the number of different precisions during the second epoch. ([¶0065] " since the illustrated set of numbers have different exponent values in the floating-point format, each number's respective mantissa may be shifted such that the same or a proximate number is represented in the quantized format" [¶0079] "the quantized values are actually stored in memory as block floating-point numbers" [¶0078] "For example, the learning rate of the network may be reduced when performing fine-tuning as part of the quantized training 650. After at least one round of fine-tuning to produce an updated quantized model 630, an updated quantized model W** 660 is produced. This updated quantized model 660 can be used to evaluate how the trained model will perform one provided to a neural network hardware accelerator." updating the quantized model interpreted as synonymous with processing the subsets of different precisions.  Block floating-point number interpreted as synonymous with different precision subset. See also FIG. 3). 

	Claims 10, 12-13, and 15 are directed towards an apparatus for performing the method of claims 1 and 3-5, respectively.  Therefore, the rejections applied to claims 1 and 2-5 also apply to claims 10, 12-13, and 15.  Claim 10 also teaches a processor and memory to perform the method (Burger [¶0003] “one or more general-purpose processors coupled to the memory”).  

Regarding claim 14, Burger teaches the apparatus of claim 12, wherein the memory is configured to store the sets of model parameters at the different precisions associated with the subsets of the samples (Burger [¶0079] “In some examples, the quantized values are actually stored in memory as block floating-point numbers, and a library of neural network operations as provided to operate on the quantized values as if the neural network was implemented using a hardware accelerator.” Block floating-point number interpreted as subset of different precision values.).

	Regarding claim 20, claim 20 is substantially similar to claim 10.  Therefore, the rejection applied to claim 10 also applies to claim 20.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: 
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


	Claims 2 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Burger and in view of Ould-Ahmed-Vall (US 2020/0364823 A1).

	Regarding claim 2, Burger teaches The method of claim 1.
	However, Burger does not explicitly teach assigning the subsets to the operands comprises assigning subsets having higher losses to operands having higher precisions.  

Ould-Ahmed-Vall, in the same field of endeavor, teaches assigning the subsets to the operands comprises assigning subsets having higher losses to operands having higher precisions. ([¶0202] "The dynamic floating point unit 1608C will attempt to perform a mixed precision 16-bit/32-bit operation at 16-bits of precision unless significant precision loss or error will occur."). 

	Burger and Ould-Ahmed-Vall are both directed towards dynamic precision formats in neural network accelerators.  Therefore, Burger and Ould-Ahmed-Vall are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the block format in Burger with the deterministic mixed precision system of Ould-Ahmed-Vall by performing mixed precision operations. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Burger the performance advantages of using lower precisions in machine learning operations.  Ould-Ahmed-Vall further discloses deterministically assigning values to operands based on the loss.  This is interpreted as being designed to prevent low precision operations done on high precision (larger) numbers which would result in large errors.  One of ordinary skill in the art would understand that the motivation to reduce precision is to increase system performance, but Ould-Ahmed-Vall teaches “Some neural networks may still benefit from the added precision of calculations using N-bit feature maps and N-bit filters. In some implementations, N-bit features and weights for a neural network can be processed at low precision without significant reduction in output error. However, a data scientist implementing a low precision N-bit neural network (e.g., FP16, INT8) generally should be aware of rounding errors or out of bounds data that may arise due to successive calculations at low precision.”  Therefore the advantage to having a flexible system for dynamic precision operations in a machine learning environment is obvious.

Regarding claim 11, claim 11 is directed towards an apparatus for performing the method of claim 2.  Therefore, the rejection applied to claim 2 also applies to claim 11.  Claim 11 also teaches a processor and memory to perform the method (Burger [¶0003] “one or more general-purpose processors coupled to the memory”).  

	Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Burger and in view of Chung (US 2018/0341851 A1).

	Regarding claim 6, Burger teaches The method of claim 5.
	However, Burger does not explicitly teach modifying the number of the subsets comprises decreasing the number of the subsets during the second epoch, relative to the number of the subsets used during the first epoch.  

Chung, in the same field of endeavor, teaches modifying the number of the subsets comprises decreasing the number of the subsets during the second epoch, relative to the number of the subsets used during the first epoch. ([¶0034] "During a tuning phase, adjustments can be made in the area of approximate computing by dynamically adjusting the tuning parameters, when the opportunity arises e.g., during a training" [¶0035] "Also within an approximate computing framework, other tuning parameters can be adjusted...For example, adjustments can include: using dropout sparsification to send a quasi-random subset of weights, rolling updates that transmit only a pre-specified subset of weights in a round-robin fashion" Decreasing the number of relative subsets is interpreted as synonymous with drop-out.). 

Burger and Chung are both directed towards dynamic precision formats in neural network accelerators.  Therefore, Burger and Chung are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the block format in Burger with the deterministic mixed precision system of Chung by performing dropout.  Chung teaches that drop-out can be used to increase the sparsity of the network.  Chung further explains ([¶0059] “Some approximate computing techniques affecting computation time include switching from single to double precision to half precision, for example. By doing so the system 100 dynamically updates the training parameters of the training process to modulate the compute time relative to the communication time, and thereby moves toward parsimonious utilization of system resources for accelerated training. The compression in this case could be any of the many techniques known to those skilled in the art, such as random sparsification or thresholded drop-out, and the like.”).  The motivation to use reduced precision to increase computing time aligns with that disclosed in Buger.  

Regarding claim 16, claim 16 is directed towards an apparatus for performing the method of claim 6.  Therefore, the rejection applied to claim 6 also applies to claim 16.  Claim 16 also teaches a processor and memory to perform the method (Burger [¶0003] “one or more general-purpose processors coupled to the memory”).  

	Claims 7-8 and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Burger and in view of Taras (“QUANTIZATION ERROR AS A METRIC FOR DYNAMIC PRECISION SCALING IN NEURAL NET TRAINING”, 2018). 

	Regarding claim 7, Burger teaches The method of claim 1.
	However, Burger does not explicitly teach modifying at least one of the different precisions during a second epoch that is subsequent to the first epoch.  

Taras, in the same field of endeavor, teaches modifying at least one of the different precisions during a second epoch that is subsequent to the first epoch. ([p. 3] "we quantize weights, biases, activations, and gradients at the appropriate pass through the network, and update the precision on-the-fly during training on each iteration" Iteration is interpreted as synonymous with epoch.). 

	Burger and Taras are both directed towards dynamic precision formats in neural network accelerators.  Therefore, Burger and Taras are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the block format in Burger with the deterministic mixed precision system of Taras by changing precision in a second epoch.  Taras and Burger both share identical motivations for improving the speed of a neural network through mixed precision format arithmetic, but Taras also notes ([p. 3] “naively reducing the bit-width of weights and activations to a fixed 13-bits with no dynamic precision scaling results in the training process failing to converge.”).  

	Regarding claim 8, the combination of Burger, and Taras teaches The method of claim 7, wherein modifying the different precisions comprises decreasing the at least one of the different precisions during the second epoch, relative to the different precisions used during the first epoch. (Taras [p. 3] "Our results reveal that we can achieve accuracy on-par with the baseline, whilst drastically reducing the bit-width used for both weights and activations." See Fig. 2 "Moving average bitwidths during training using DPS." Figure 2 shows significant precision reduction of activation between first and second iterations.). 

Regarding claims 17-18, claims 17-18 are directed towards an apparatus for performing the method of claims 7-8.  Therefore, the rejection applied to claims 7-8 also apply to claims 17-18.  Claims 17-18 also teaches a processor and memory to perform the method (Burger [¶0003] “one or more general-purpose processors coupled to the memory”).  

	Claims 9 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Burger and in view of Na (“Speeding up Convolutional Neural Network Training with Dynamic Precision Scaling and Flexible Multiplier-Accumulator”, 2016). 

	Regarding claim 9, Burger teaches The method of claim 1, further comprising: determining a validation error based on a validation set in response to completing the first epoch ([¶0076] "inference validation can be used to compare an expected output for the neural network with the output that is produced using the quantized model").
	However, Burger does not explicitly teach setting the different precisions to a maximum precision for the subsets during a second epoch that is subsequent to the first epoch in response to the validation error increasing relative to a previously determined validation error.  

Na, in the same field of endeavor, teaches setting the different precisions to a maximum precision for the subsets during a second epoch that is subsequent to the first epoch in response to the validation error increasing relative to a previously determined validation error. ([p. 3 Sec. 3.2] "If the moving average keeps decreasing, no action is taken. If the training becomes numerically unstable, it increases the precision to its maximum value (ml). Since the training might become extremely unstable due to the wrong guess (tl), immediate change to the maximum precision is beneficial." Wrong guess is interpreted as synonymous with validation error.  Stable precision explicitly taught as relative decrease in moving average, therefore unstable precision is interpreted as synonymous with relative increase in moving average.). 

	Burger and Na are both directed towards dynamic precision formats in neural network accelerators.  Therefore, Burger and Na are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the block format in Burger with the deterministic mixed precision system of Na. Na teaches that this dynamic precision scaling (Na [Abstract] “can achieve 5.7x speed-up while consuming 31% energy compared to baseline for modified Alexnet on Flickr image style recognition task.”).  While Burger does not explicitly mention Alexnet, Burger does teach image recognition as one of the primary applications for their quantized format (Burger [¶0061] “Examples of suitable applications for such neural network BFP implementations include, but are not limited to: performing image recognition”).  

Regarding claim 19, claim 19 is directed towards an apparatus for performing the method of claim 9.  Therefore, the rejection applied to claim 9 also applies to claim 19.  Claim 19 also teaches a processor and memory to perform the method (Burger [¶0003] “one or more general-purpose processors coupled to the memory”).  

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        
/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126