Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 5 and 6 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding claim 5, processing a modified number of subsections in the first epoch which were modified in the second epoch which is temporally subsequent to the first epoch is indefinite.  One of ordinary skill in the art would not be able to determine within reason a method for processing elements before they were available to be processed.  In the interest of further examination “processing the modified number of the subsets in the neural network at the number of different precisions during the first epoch” is interpreted as processing the subsets in the first epoch prior to modification in the second epoch.



Claim Rejections - 35 USC § 101
101 Rejection
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 USC § 101 because the claimed invention is directed to non-statutory subject matter.

Regarding Claim 1:  Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 1 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: determining losses of samples within an input volume that is provided to a neural network during a first epoch (observation and evaluation), grouping the samples into subsets based on the losses (observation, evaluation, and judgement), and assigning the subsets to operands in the neural network that represent the samples at different precisions so that each subset is associated with a different precision (observation, evaluation, and judgement).  Therefore, claim 1 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 1 recites additional elements “processing the subsets in the neural network at the different precisions during the first epoch” However, these additional features are generic functions performed on generic computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Therefore, claim 1 is directed to a judicial exception.
Step 2B Analysis:  Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 1 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
The above analysis also applies to claim 10 and 20 which recite corresponding features.  Therefore, claims 1, 10 and 20 recite an abstract idea which is a judicial exception.

Regarding Claim 2:  Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 2 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 2 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: wherein assigning the subsets to the operands comprises assigning subsets having higher losses to operands having higher precisions (observation, evaluation, and judgement).  Therefore, claim 2 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 2 recites additional elements introduced in claim 1.  However, these additional features are generic functions performed on generic computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Therefore, claim 2 is directed to a judicial exception.
Step 2B Analysis:  Claim 2 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 2 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
The above analysis also applies to claim 11 which recite corresponding features.  Therefore, claims 2 and 11 recite an abstract idea which is a judicial exception. 

Regarding Claim 3:  Claim 3 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 3 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 3 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: wherein processing the subsets comprises determining sets of model parameters for the subsets during at least one of a forward pass and a backward pass through the neural network during the first epoch (observation, evaluation, and judgement).  Therefore, claim 3 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 3 recites additional elements introduced in claim 1.  However, these additional features are generic functions performed on generic computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Claim 3 itroduces additional elements “wherein each set of model parameters for the subsets is represented at a different corresponding one of the different precisions” which amounts to selection of a data type, which is insignificant extra-solution activity and does not integrate the judicial exception into a practical application.  Therefore, claim 3 is directed to a judicial exception.
Step 2B Analysis:  Claim 3 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 3 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
The above analysis also applies to claim 12 which recite corresponding features.  Therefore, claims 3 and 12 recite an abstract idea which is a judicial exception.

Regarding Claim 4:  Claim 4 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 4 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 4 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes.  Therefore, claim 4 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 4 recites additional elements introduced in claim 1.  Claim 4 introduces additional elements “connection weights”, “activations of neurons”, and “gradients”.  However, these additional features are generic functions performed on generic computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Therefore, claim 4 is directed to a judicial exception.
Step 2B Analysis:  Claim 4 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 4 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
The above analysis also applies to claim 13 which recite corresponding features.  Therefore, claims 4 and 13 recite an abstract idea which is a judicial exception.

Regarding Claim 5:  Claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 5 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 5 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes.  Therefore, claim 5 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 5 recites additional elements introduced in claim 1.  Claim 5 introduces additional elements “modifying a number of the subsets during a second epoch that is subsequent to the first epoch”, “processing the modified number of the subsets in the neural network at the number of different precisions during the first epoch”.  However, these additional features are generic functions performed on generic computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Therefore, claim 5 is directed to a judicial exception.
Step 2B Analysis:  Claim 5 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 5 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
The above analysis also applies to claim 15 which recite corresponding features.  Therefore, claims 5 and 15 recite an abstract idea which is a judicial exception.

Regarding Claim 6:  Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 6 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 6 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: wherein modifying the number of the subsets comprises decreasing the number of the subsets during the second epoch, relative to the number of the subsets used during the first epoch (observation, evaluation, and judgement).  Therefore, claim 6 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 6 recites additional elements introduced in claim 5.  However, these additional features are generic functions performed on generic computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Therefore, claim 6 is directed to a judicial exception.
Step 2B Analysis:  Claim 6 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 6 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
The above analysis also applies to claim 16 which recite corresponding features.  Therefore, claims 6 and 16 recite an abstract idea which is a judicial exception.

Regarding Claim 7:  Claim 7 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 7 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 7 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes.  Therefore, claim 7 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 7 recites additional elements introduced in claim 1.  Claim 7 also introduces additional elements “modifying at least one of the different precisions during a second epoch that is subsequent to the first epoch.”.  However, these additional features are generic functions performed on generic computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Therefore, claim 7 is directed to a judicial exception.
Step 2B Analysis:  Claim 7 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 7 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
The above analysis also applies to claim 17 which recite corresponding features.  Therefore, claims 7 and 17 recite an abstract idea which is a judicial exception.

Regarding Claim 8:  Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 8 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 8 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes.  Therefore, claim 8 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 8 recites additional elements introduced in claim 7.  Claim 8 also introduces additional elements “wherein modifying the different precisions comprises decreasing the at least one of the different precisions during the second epoch, relative to the different precisions used during the first epoch.”.  However, these additional features are generic functions performed on generic computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Therefore, claim 8 is directed to a judicial exception.
Step 2B Analysis:  Claim 8 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 8 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
The above analysis also applies to claim 18 which recite corresponding features.  Therefore, claims 8 and 18 recite an abstract idea which is a judicial exception.

Regarding Claim 9:  Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 9 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 9 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: determining a validation error based on a validation set in response to completing the first epoch (observation, evaluation, and judgement), and in response to the validation error increasing relative to a previously determined validation error (observation, evaluation, and judgement).  Therefore, claim 9 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 9 recites additional elements introduced in claim 1.  Claim 9 also introduces additional elements “setting the different precisions to a maximum precision.”.  However, these additional features are generic functions performed on generic computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Therefore, claim 9 is directed to a judicial exception.
Step 2B Analysis:  Claim 9 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 9 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
The above analysis also applies to claim 19 which recite corresponding features.  Therefore, claims 9 and 19 recite an abstract idea which is a judicial exception.

Regarding Claim 14:  Claim 14 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 14 is directed to an apparatus, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 14 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: wherein the memory is configured to store the sets of model parameters at the different precisions associated with the subsets of the samples (gathering data).  Therefore, claim 14 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 14 recites additional elements introduced in claim 10.  Claim 14 also introduces additional elements “the memory”.  However, these additional features are generic functions performed on generic computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Therefore, claim 14 is directed to a judicial exception.
Step 2B Analysis:  Claim 14 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 14 amount to no more than mere instructions to apply the judicial exception using a generic computer component.


Therefore, when considering the elements separately and in combination, they do not do not add significantly more to the inventive concept. Accordingly, claims 1-20 are rejected under 35 U.S.C. § 101. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim 1, 3-5, 10, 12-15, and 20 are rejected under 35 U.S.C. 102 as being unpatentable over Burger (US 2019/0340499 A1). 

Regarding claim 1, Burger teaches A method comprising: determining losses of samples within an input volume that is provided to a neural network during a first epoch; ([¶0078] "the learning rate of the network may be reduced when performing fine-tuning as part of the quantized training 650. After at least one round of fine-tuning to produce an updated quantized model 630, an updated quantized model W** 660 is produced. This updated quantized model 660 can be used to evaluate how the trained model will perform" [¶0080] "As shown, a quantized floating-point model 630 is provided for quantized training. Loss is computed at 710 for forward-pass training using the quantized model 630. A gradient of the error is computed with respect to the quantized model" First epoch interpreted as synonymous with first round of training.   ).
 ([¶0083] "In some examples, a shared exponent for a tensor or a split tensor can be selected in a number of different ways." [¶0085] "For example, an exponent can be selected such that accuracy of the representation is improved overall" Grouping by accuracy is interpreted as synonymous with grouping based on the losses. ).
assigning the subsets to operands in the neural network that represent the samples at different precisions so that each subset is associated with a different precision; and ([¶0075] "As shown in FIG. 5, a neural network represented as a set of pre-trained model values W 510 is provided...After the neural network has been trained, at least a portion of the neural network trained model W* 520 (comprising one or more input tensor) is converted from its normal-precision floating-point format into a set of quantized precision format numbers." W is taught as a set of fixed precision numbers and W* is a set of block floating point quantized precision numbers.).
processing the subsets in the neural network at the different precisions during the first epoch. ([¶0077] "As shown, an input neural network model W 610 is represented as a set of pre-trained model parameters, which has been trained using a normal floating-point representation, to produce a trained neural network model W*, in a similar manner to that discussed above regarding FIG. 5. This trained model W* 620 is quantized to produce a quantized model Wq 630" producing a quantized model is interpreted as synonymous with processing the subsets at the different precisions. ). 

Regarding claim 3, Burger teaches The method of claim 1, wherein processing the subsets comprises determining sets of model parameters for the subsets during at least one of a forward pass and a backward pass through the neural network during the first epoch, wherein each set of model parameters for the subsets is represented at a different corresponding one of the different precisions. ([¶0050] "For example, a portion of values representing the neural network can be received, including edge weights, activation values, or other suitable parameters for quantization." [¶0080] "During back propagation, a straight-through estimator is used for the gradients of the quantization operators. In other words, back-propagation is performed using identity operators instead of derivatives of the forward pass operation." [¶0081] "For example, instead of using a single shared exponent for an entire array, multiple shared exponents may be used on a per-row, a per-column, or a per-tile basis. In some examples, it may be determined that certain portions of the neural network should be implemented with normal-precision floating-point using a general-purpose CPU in the final model, and only other portions of the neural network model are quantized and implemented using a hardware accelerator." ). 

Regarding claim 4, Burger teaches The method of claim 3, wherein the sets of model parameters comprise at least one of connection weights for connections between nodes in the neural network, activations of neurons in the neural network, and gradients for steepest descent estimations. ([¶0050] "For example, a portion of values representing the neural network can be received, including edge weights, activation values, or other suitable parameters for quantization." Connection weights interpreted as synonymous with edge weights.). 

Regarding claim 5, Burger teaches The method of claim 1, further comprising: modifying a number of the subsets during a second epoch that is subsequent to the first epoch; and ([¶0078] "parameters used for the fine-tuning training are varied from that used for the floating-point training performed to produce the trained neural network W* 620...the learning rate of the network may be reduced when performing fine-tuning as part of the quantized training 650. After at least one round of fine-tuning to produce an updated quantized model" Round interpreted as synonymous with epoch such that during a second epoch is interpreted as synonymous with after one round. ).
processing the modified number of the subsets in the neural network at the number of different precisions during the first epoch. ([¶0065] " since the illustrated set of numbers have different exponent values in the floating-point format, each number's respective mantissa may be shifted such that the same or a proximate number is represented in the quantized format" [¶0079] "the quantized values are actually stored in memory as block floating-point numbers" updating the quantized model interpreted as synonymous with processing the subsets of different precisions.  Block floating-point number interpreted as synonymous with different precision subset. See also FIG. 3). 

Regarding claim 10, claim 10 effectively mirrors claim 1 and is therefore rejected under a similar interpretation.

Regarding claim 12, claim 12 effectively mirrors claim 3 and is therefore rejected under a similar interpretation.

Regarding claim 13, claim 13 effectively mirrors claim 4 and is therefore rejected under a similar interpretation.

Regarding claim 14, Burger teaches the apparatus of claim 12, wherein the memory is configured to store the sets of model parameters at the different precisions associated with the subsets of the samples ([¶0079] “In some examples, the quantized values are actually stored in memory as block floating-point numbers, and a library of neural network operations as provided to operate on the quantized values as if the neural network was implemented using a hardware accelerator.” Block floating-point number interpreted as subset of different precision values.).

Regarding claim 15, claim 15 effectively mirrors claim 5 and is therefore rejected under a similar interpretation.

Regarding claim 20, claim 20 effectively mirrors claim 1 and is therefore rejected under a similar interpretation.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

16.	Claims 2 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Burger and in view of Ould-Ahmed-Vall (US 2020/0364823 A1).

Regarding claim 2, Burger teaches The method of claim 1. However, Burger does not explicitly teach wherein assigning the subsets to the operands comprises assigning subsets having higher losses to operands having higher precisions.  

Ould-Ahmed-Vall who teaches a related art of mixed-precision training of machine learning systems teaches wherein assigning the subsets to the operands comprises assigning subsets having higher losses to operands having higher precisions. ([¶0202] "The dynamic floating point unit 1608C will attempt to perform a mixed precision 16-bit/32-bit operation at 16-bits of precision unless significant precision loss or error will occur." See also ¶0204 and FIG. 17 error fork in FIG. 17 interpreted as assigning subset to operand.). 

Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the block format in Burger with the deterministic mixed precision system of Ould-Ahmed-Vall. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Burger the performance advantages of using lower precisions in machine learning operations.  Ould-Ahmed-Vall further discloses deterministically assigning values to operands based on the loss.  This is interpreted as being designed to prevent low precision operations done on high precision (larger) numbers which would result in large errors.  One of ordinary skill in the art would understand that the motivation to reduce precision is to increase system performance, but Ould-Ahmed-Vall teaches “Some neural networks may still benefit from the added precision of calculations using N-bit feature maps and N-bit filters. In some implementations, N-bit features and weights for a neural network can be processed at low precision without significant reduction in output error. However, a data scientist implementing a low precision N-bit neural network (e.g., FP16, INT8) generally should be aware of rounding errors or out of bounds data that may arise due to successive calculations at low precision.”  Therefore the advantage to having a flexible system for dynamic precision operations in a machine learning environment is obvious. 

Regarding claim 11, claim 11 effectively mirrors claim 2 and is therefore rejected under a similar interpretation.

17.	Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Burger and in view of Chung (US 2018/0341851 A1). 

Regarding claim 6, Burger teaches The method of claim 5.  However, Burger does not explicitly teach, wherein modifying the number of the subsets comprises decreasing the number of the subsets during the second epoch, relative to the number of the subsets used during the first epoch.  

Chung who teaches a related art of training a mixed precision machines learning system teaches The method of claim 5, wherein modifying the number of the subsets comprises decreasing the number of the subsets during the second epoch, relative to the number of the subsets used during the first epoch. ([¶0034] "During a tuning phase, adjustments can be made in the area of approximate computing by dynamically adjusting the tuning parameters, when the opportunity arises e.g., during a training" [¶0035] "Also within an approximate computing framework, other tuning parameters can be adjusted...For example, adjustments can include: using dropout sparsification to send a quasi-random subset of weights, rolling updates that transmit only a pre-specified subset of weights in a round-robin fashion" Decreasing the number of relative subsets is interpreted as synonymous with drop-out.). 

Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to implement dropout as suggested by Chung in the machine learning system of Burger. The combination would have been obvious because a person of ordinary skill in the art would be able determine from Chung that drop-out can be used to increase the sparsity of the network.  Chung further explains ([¶0059] “Some approximate computing techniques affecting computation time include switching from single to double precision to half precision, for example. By doing so the system 100 dynamically updates the training parameters of the training process to modulate the compute time relative to the communication time, and thereby moves toward parsimonious utilization of system resources for accelerated training. The compression in this case could be any of the many techniques known to those skilled in the art, such as random sparsification or thresholded drop-out, and the like.”).  The motivation to use reduced precision to increase computing time aligns with that disclosed in Buger.  

Regarding claim 16, claim 16 effectively mirrors claim 6 and is therefore rejected under a similar interpretation.

18.	Claims 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Burger and in view of Taras (“QUANTIZATION ERROR AS A METRIC FOR DYNAMIC PRECISION SCALING IN NEURAL NET TRAINING”, 2018). 

Burger teaches The method of claim 1.  However, Burger does not explicitly teach further comprising: modifying at least one of the different precisions during a second epoch that is subsequent to the first epoch.  

Taras who teaches a related art of mixed precision training in a machine learning system teaches The method of claim 1, further comprising: modifying at least one of the different precisions during a second epoch that is subsequent to the first epoch. ([p. 3] "we quantize weights, biases, activations, and gradients at the appropriate pass through the network, and update the precision on-the-fly during training on each iteration" Iteration is interpreted as synonymous with epoch.). 

Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the mixed precision system in Burger with the dynamically scaling precision in Taras. The combination would have been obvious because a person of ordinary skill in the art would be able to determine that Taras and Burger both share identical motivations for improving the speed of a neural network through mixed precision format arithmetic, but Taras also notes ([p. 3] “naively reducing the bit-width of weights and activations to a fixed 13-bits with no dynamic precision scaling results in the training process failing to converge.”).  

Regarding claim 8, the combination of Burger, and Taras teaches The method of claim 7, wherein modifying the different precisions comprises decreasing the at least one of the different precisions during the second epoch, relative to the different precisions used during the first epoch. (Taras [p. 3] "Our results reveal that we can achieve accuracy on-par with the baseline, whilst drastically reducing the bit-width used for both weights and activations." See Fig. 2 "Moving average bitwidths during training using DPS." Figure 2 shows significant precision reduction of activation between first and second iterations.). 

Regarding claim 17, claim 17 effectively mirrors claim 7 and is therefore rejected under a similar interpretation.

Regarding claim 18, claim 18 effectively mirrors claim 8 and is therefore rejected under a similar interpretation.

19.	Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Burger and in view of Na (“Speeding up Convolutional Neural Network Training with Dynamic Precision Scaling and Flexible Multiplier-Accumulator”, 2016). 

Regarding claim 9, Burger teaches The method of claim 1, further comprising: determining a validation error based on a validation set in response to completing the first epoch ([¶0076] "inference validation can be used to compare an expected output for the neural network with the output that is produced using the quantized model"). However, Burger does not explicitly teach setting the different precisions to a maximum precision for the subsets during a second epoch that is subsequent to the first epoch in response to the validation error increasing relative to a previously determined validation error.  

Na who teaches a related method of mixed precision training for machine learning systems teaches setting the different precisions to a maximum precision for the subsets during a second epoch that is subsequent to the first epoch in response to the validation error increasing relative to a previously determined validation error. ([p. 3 Sec. 3.2] "If the moving average keeps decreasing, no action is taken. If the training becomes numerically unstable, it increases the precision to its maximum value (ml). Since the training might become extremely unstable due to the wrong guess (tl), immediate change to the maximum precision is beneficial." Wrong guess is interpreted as synonymous with validation error.  Stable precision explicitly taught as relative decrease in moving average, therefore unstable precision is interpreted as synonymous with relative increase in moving average.). 

Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the mixed precision system in Burger with the precision determination in Na. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Na that this dynamic precision scaling (Na [Abstract] “can achieve 5.7x speed-up while consuming 31% energy compared to baseline for modified Alexnet on Flickr image style recognition task.”).  While Burger does not explicitly mention Alexnet, Burger does teach image recognition as one of the primary applications for their quantized format (Burger [¶0061] “Examples of suitable applications for such neural network BFP implementations include, but are not limited to: performing image recognition”).  

Regarding claim 19, claim 19 effectively mirrors claim 9 and is therefore rejected under a similar interpretation.

Conclusion
20.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Hou (“Loss-aware weight quantization of deep networks”, 2018).
21.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720.  The examiner can normally be reached on M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        



/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124