DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is responsive to the original application filed on 12/31/2018 and the Remarks and Amendments filed on 6/4/2022.

Claim Rejections - 35 USC § 101

35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 15-20 are rejected under 35 U.S.C. § 101 because the claimed invention is directed towards non-statutory subject matter.  The claims do not fall within at least one of the four categories of patent eligible subjected matter because the claimed invention is directed towards signals per se.

Looking to the originally filed specification, paragraph [0145] discloses “[a]s should be readily understood, the term computer-readable storage media includes the media for data storage such as memory 1320 and storage 1340, and not transmission media such as modulated data signals” (emphasis added). Under a broadest reasonable interpretation of the claim language, the “computer readable storage devices or media” of claim 15 may also encompass transitory signals as transitory forms of signal communication are not explicitly excluded (especially in view of the “such as” language of paragraph [0145]), and is thus directed towards signals per se. Further, claim 15, as amended, recites “One or more computer-readable storage devices or media not including transmission media and modulated data signals, the computer-readable storage devices or media”; this phrase does not exclude non-modulated data signals and the specification fails to define exactly what constitutes transmission media. Examiner suggests amending claim 15 to recite “One or more non-transitory computer-readable storage devices or media".  Claims 16-20 depend on rejected claim 15, and are also rejected under 35 USC § 101 based on this dependency. Appropriate correction is required.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



	Claims 1-6, 8, 10, 11, 13, 15-18, and 20 are rejected under 35 U.S.C. § 103 as being obvious over Drumond et al. (Drumond et al., “Training DNNs with Hybrid Block Floating Point”, Dec. 2, 2018, arXiv:1804.01526v4, pp. 1-11, hereinafter “Drumond”) in view of Shirahata et al. (US 20200202201 A1, hereinafter “Shirahata”).

	Regarding claim 1, Drumond discloses [a] computing system comprising: one or more processors; memory in communication with the one or more processors; and the computing system being configured to: (Abstract; “We identify block floating point (BFP) as a promising alternative representation since it exhibits wide dynamic range and enables the majority of DNN operations to be performed with fixed-point logic . . . we introduce HBFP, a hybrid BFP-FP approach, which performs all dot products in BFP and other operations in floating point”, which discloses the computing system, and in view of the experiments section (§5) of the paper, the system and method of the paper is inherently performed using a computing system with a processor and memory)
with at least one of the processors, (inherently disclosed by Drumond) perform forward propagation for at least one layer of a neural network to propagate activation values for edges of the at least one layer, (§4.1; Hybrid block floating point DNN training: "We propose the use of BFP for all dot product computations, with other operations performed in floating-point representations ... "; and § 5.1. HBFP simulation on GPU: "We train DNNs with the proposed HBFP approach, using BFP in the compute-intensive operations (matrix-multiplications, convolutions, and their backward passes) [ ... ] both forward and backward passes to simulate BFP. In the forward pass, we convert the activations to BFP, giving the x tensor one exponent per training input. Then, we execute the target operation in native floating-point arithmetic ...")
with at least one of the processors, (inherently disclosed by Drumond) perform backward propagation for at least one layer of a neural network to propagate gradients for nodes of the at least one layer; (§4.1; Hybrid block floating point DNN training: "We propose the use of BFP for all dot product computations, with other operations performed in floating-point representations ... "; and § 5.1. HBFP simulation on GPU: "We train DNNs with the proposed HBFP approach, using BFP in the compute-intensive operations (matrix-multiplications, convolutions, and their backward passes) [ ... ] both forward and backward passes to simulate BFP [ ... ] In the backward pass, we perform the same pre-/post-processing of the inputs/outputs of the x derivative ...)
with at least one of the processors, (inherently disclosed by Drumond) determine a performance metric for the neural network, (§6. Evaluation; "We now evaluate DNN training with HBFP [ ... ] we move on to evaluate HBFP on various datasets and tasks ... "m which discloses determining a performance metric through DNN training)
based on the performance metric, adjust a parameter of the neural network, the parameter being selected to improve the performance metric in a neural network having the adjusted parameter, and (§6 Evaluation; " ... We explore the design space of BFP, finding the best-performing configurations of BFP. We vary both the mantissa width and the tile sizes [ ... ] we train models with 4-, 8-, 12- and 16-bit wide mantissas [ ... ] The sweet spot in the design space is HBFP with 8- to 12-bit mantissa, 16-bit weight storage and a tile size of 24 ... ")
update activation values and/or weights for the neural network based on the adjusted parameter to produce the neural network having the adjusted parameter (§6 Evaluation; " ... We explore the design space of BFP, finding the best-performing configurations of BFP. We vary both the mantissa width and the tile sizes[ ... ] we train models with 4-, 8-, 12- and 16-bit wide mantissas [ ... ] The sweet spot in the design space is HBFP with 8- to 12-bit mantissa, 16-bit weight storage and a tile size of 24 ... ").
Drumond fails to explicitly disclose but Shirahata discloses and subsequent to the update to the activation values and/or to the weights, perform additional propagation operations for the at least one layer of the neural network having the adjusted parameter ([0116-0117]; “Then, the NN processor performs process S10 while propagating the layers in the NN in the forward direction (S9), while propagating, the NN processor updates the parameters of the layers such as the weight and the bias in accordance with the difference in the parameters calculated while back propagation (S10). The update processing for the parameters is also performed until the processing is completed for all the layers (S11) . . . In the iteration steps S20 to S32 for the second learning and the learning thereafter, the NN processor performs process S23 while forward propagating (S21). While forward propagating, the NN processor executes the operations of the layers with the selected method (S23) until the operations are completed for all the layers (S24). Then, the NN processor performs process S27 while back propagating (S25). While back propagating, the NN processor executes the operations of the layers with selected method (S27) until the operations are completed for all the layers (S28). Then, the NN processor performs process S30 while propagating to the layers in the NN in the forward direction (S29). While forward propagating, the NN processor updates parameters such as the weight and the bias in the layers in accordance with the difference in the parameters calculated while the back propagation (S30) until the update is completed for all the layers (S31). The processing described above is repeated until the iteration of the learning ends (S32).”, (emphasis added), which discloses, under a broadest reasonable interpretation of the claim language, updating weights and performing additional propagation operations over multiple iterations during forward and backward pass operations for all layers of the neural network.  Note that because the weight updating and propagation operations are performed over multiple operations, Shirahata teaches that the additional propagation is performed subsequent to a first propagation).
Drumond and Shirahata are analogous art because both are concerned with training neural networks.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in training neural networks to combine the additional propagation operations of Shirahata with the computing system of Drumond to yield the predictable result of subsequent to the update to the activation values and/or to the weights, perform additional propagation operations for the at least one layer of the neural network having the adjusted parameter. The motivation for doing so would be to update the weights for all layers of a neural network (Shirahata; [0117]).

Regarding claim 8, Drumond discloses [a] method of operating a computing system implementing a neural network, the method comprising: with the computing system: (Abstract; “We identify block floating point (BFP) as a promising alternative representation since it exhibits wide dynamic range and enables the majority of DNN operations to be performed with fixed-point logic . . . we introduce HBFP, a hybrid BFP-FP approach, which performs all dot products in BFP and other operations in floating point”, which discloses the computing system that implements a neural network)
training at least one layer of the neural network by forward propagating and backward propagating activation or gradient values, respectively, for a number of training epochs; and (§4.1; Hybrid block floating point DNN training: "We propose the use of BFP for all dot product computations, with other operations performed in floating-point representations ... "; and § 5.1. HBFP simulation on GPU: "We train DNNs with the proposed HBFP approach, using BFP in the compute-intensive operations (matrix-multiplications, convolutions, and their backward passes) [ ... ] both forward and backward passes to simulate BFP. In the forward pass, we convert the activations to BFP, giving the x tensor one exponent per training input. Then, we execute the target operation in native floating-point arithmetic ... in the backward pass, we perform the same pre-/post-processing of the inputs/outputs of the x derivative")
adjusting a precision parameter of the neural network and updating the neural network for the adjusted precision parameter to produce an updated, trained neural network (§6 Evaluation; " ... We explore the design space of BFP, finding the best-performing configurations of BFP. We vary both the mantissa width and the tile sizes[ ... ] we train models with 4-, 8-, 12- and 16-bit wide mantissas [ ... ] The sweet spot in the design space is HBFP with 8- to 12-bit mantissa, 16-bit weight storage and a tile size of 24 ... ").
Drumond fails to explicitly disclose but Shirahata discloses with the updated, trained neural network, performing additional training for at least one layer of the neural network by forward propagating and backward propagating activation or gradient values, respectively, for at least one additional training epoch ([0116-0117]; “Then, the NN processor performs process S10 while propagating the layers in the NN in the forward direction (S9), while propagating, the NN processor updates the parameters of the layers such as the weight and the bias in accordance with the difference in the parameters calculated while back propagation (S10). The update processing for the parameters is also performed until the processing is completed for all the layers (S11) . . . In the iteration steps S20 to S32 for the second learning and the learning thereafter, the NN processor performs process S23 while forward propagating (S21). While forward propagating, the NN processor executes the operations of the layers with the selected method (S23) until the operations are completed for all the layers (S24). Then, the NN processor performs process S27 while back propagating (S25). While back propagating, the NN processor executes the operations of the layers with selected method (S27) until the operations are completed for all the layers (S28). Then, the NN processor performs process S30 while propagating to the layers in the NN in the forward direction (S29). While forward propagating, the NN processor updates parameters such as the weight and the bias in the layers in accordance with the difference in the parameters calculated while the back propagation (S30) until the update is completed for all the layers (S31). The processing described above is repeated until the iteration of the learning ends (S32).”, (emphasis added), which discloses, under a broadest reasonable interpretation of the claim language, updating weights by propagating gradient or activation values and performing additional propagation operations over multiple iterations or epochs during forward and backward pass operations for all layers of the neural network.  Note that performing backward propagation operations inherently uses gradients to compute and update weights for a NN).
Drumond and Shirahata are analogous art because both are concerned with training neural networks.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in training neural networks to combine the additional training operations of Shirahata with the computing system of Drumond to yield the predictable result of with the updated, trained neural network, performing additional training for at least one layer of the neural network by forward propagating and backward propagating activation or gradient values, respectively, for at least one additional training epoch. The motivation for doing so would be to update the weights for all layers of a neural network (Shirahata; [0117]).


Regarding claim 15, Drumond discloses [o]ne or more computer-readable storage devices or media storing computer- executable instructions, which when executed by a computer, cause the computer to perform a method of configuring a computer system to implement a neural network, the instruction comprising: (Abstract; “We identify block floating point (BFP) as a promising alternative representation since it exhibits wide dynamic range and enables the majority of DNN operations to be performed with fixed-point logic . . . we introduce HBFP, a hybrid BFP-FP approach, which performs all dot products in BFP and other operations in floating point”, which discloses the computing system, and in view of the experiments section (§5) of the paper, the system and method of the paper is inherently performed using a computing system with a processor and memory)
instructions that cause the computer system to implement a first layer of the neural network using first node weights and/or first activation values expressed in a first floating- point format; (§4.1; “We propose to use BFP in all dot-product-based operations present in DNNs (i.e., convolutions, matrix multiplications, and outer products), and floating-point representations for all other operations (i.e., activations, regularizations, etc)”, which discloses implementing a first layer of a neural network using weights expressed in a first floating point format; and §5.1; “In the forward pass, we convert the activations to BFP, giving the x tensor one exponent per training input. Then we execute the 5 target operation in native floating-point arithmetic. In the backward pass, we perform the same pre-/post-processing of the inputs/outputs of the x derivative. We handle the weights in the optimizer”)
instructions that cause the computer system to forward propagate values from the first layer of the neural network to a second layer of the neural network; (§4.1; Hybrid block floating point DNN training: "We propose the use of BFP for all dot product computations, with other operations performed in floating-point representations ... "; and § 5.1. HBFP simulation on GPU: "We train DNNs with the proposed HBFP approach, using BFP in the compute-intensive operations (matrix-multiplications, convolutions, and their backward passes) [ ... ] both forward and backward passes to simulate BFP. In the forward pass, we convert the activations to BFP, giving the x tensor one exponent per training input. Then, we execute the target operation in native floating-point arithmetic ...)
instructions that cause the computer system to determine a training performance metric for the neural network; (§6 Evaluation; " All models with mantissas wider than 8 bits result in final validation error within 1% of the FP32 baseline, with only 4-bit mantissas showing a large accuracy gap, with 4.1% larger error. We also evaluate models with 8- and 12-bit mantissas paired with 16-bit weight storage... We explore the design space of BFP, finding the best-performing configurations of BFP. We vary both the mantissa width and the tile sizes[ ... ] we train models with 4-, 8-, 12- and 16-bit wide mantissas [ ... ] The sweet spot in the design space is HBFP with 8- to 12-bit mantissa, 16-bit weight storage and a tile size of 24 ... ").
instructions that cause the computer system to select an adjusted precision parameter for the neural network; and (§6 Evaluation; " ... We explore the design space of BFP, finding the best-performing configurations of BFP. We vary both the mantissa width and the tile sizes[ ... ] we train models with 4-, 8-, 12- and 16-bit wide mantissas [ ... ] The sweet spot in the design space is HBFP with 8- to 12-bit mantissa, 16-bit weight storage and a tile size of 24 ... ").
instructions that cause the computer system to modify the neural network based on the adjusted precision parameter (§6 Evaluation; " ... We explore the design space of BFP, finding the best-performing configurations of BFP. We vary both the mantissa width and the tile sizes[ ... ] we train models with 4-, 8-, 12- and 16-bit wide mantissas [ ... ] The sweet spot in the design space is HBFP with 8- to 12-bit mantissa, 16-bit weight storage and a tile size of 24 ... ").
Drumond fails to explicitly disclose but Shirahata discloses instructions that cause the computer system to forward propagate values from the first layer of the modified neural network to the second layer of the modified neural network; and instructions that cause the computer system to backward propagate weights or gradients from the second layer of the modified neural network to the first layer of the modified neural network ([0116-0117]; “Then, the NN processor performs process S10 while propagating the layers in the NN in the forward direction (S9), while propagating, the NN processor updates the parameters of the layers such as the weight and the bias in accordance with the difference in the parameters calculated while back propagation (S10). The update processing for the parameters is also performed until the processing is completed for all the layers (S11) . . . In the iteration steps S20 to S32 for the second learning and the learning thereafter, the NN processor performs process S23 while forward propagating (S21). While forward propagating, the NN processor executes the operations of the layers with the selected method (S23) until the operations are completed for all the layers (S24). Then, the NN processor performs process S27 while back propagating (S25). While back propagating, the NN processor executes the operations of the layers with selected method (S27) until the operations are completed for all the layers (S28). Then, the NN processor performs process S30 while propagating to the layers in the NN in the forward direction (S29). While forward propagating, the NN processor updates parameters such as the weight and the bias in the layers in accordance with the difference in the parameters calculated while the back propagation (S30) until the update is completed for all the layers (S31). The processing described above is repeated until the iteration of the learning ends (S32).”, (emphasis added), which discloses, under a broadest reasonable interpretation of the claim language, forward propagating weights between layers of the NN and backward propagating weights from a second layer to a first layer of a NN).
Drumond and Shirahata are analogous art because both are concerned with training neural networks.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in training neural networks to combine the forward and backward propagation operations of Shirahata with the computing system of Drumond to yield the predictable result of instructions that cause the computer system to forward propagate values from the first layer of the modified neural network to the second layer of the modified neural network; and instructions that cause the computer system to backward propagate weights or gradients from the second layer of the modified neural network to the first layer of the modified neural network. The motivation for doing so would be to update the weights for all layers of a neural network (Shirahata; [0117]).


Regarding claim 2, the rejection of claim 1 is incorporated and Drumond further discloses after adjusting the parameter of the neural network: perform forward propagation for the at least one layer of the neural network having the adjusted parameter to propagate activation values for edges of the at least one layer, producing updated activation values (§4.2; “therefore, we still reduce the memory bandwidth requirements for forward and backward passes, during which only the most significant bits of the weights are accessed. The least significant bits of the weight matrices are only accessed by weight update”; and §5.1; “ We handle the weights in the optimizer. We created a shell optimizer that takes the original optimizer, performs its update function in FP32 and converts the weights to two BFP formats: one with wide and another with narrow mantissas”; and §5.3; “Finally, weight updates are done entirely in the activation unit, in floating point. The proof-of-concept accelerator operates with both weights and activations stored on-chip”; and §6; the section discloses propagating activation values for edges of the layer through the experimental evaluation using training, therefore producing updated activation values)
perform backward propagation for at least one layer of the neural network having the adjusted parameter to propagate gradients for nodes of the at least one layer, producing updated weights; and (§4.2; “therefore, we still reduce the memory bandwidth requirements for forward and backward passes, during which only the most significant bits of the weights are accessed. The least significant bits of the weight matrices are only accessed by weight update”; and §5.1; “ We handle the weights in the optimizer. We created a shell optimizer that takes the original optimizer, performs its update function in FP32 and converts the weights to two BFP formats: one with wide and another with narrow mantissas”; and §5.3; “Finally, weight updates are done entirely in the activation unit, in floating point. The proof-of-concept accelerator operates with both weights and activations stored on-chip”)
store the updated activation values and/or the updated weights in the memory (§5.3; “Finally, weight updates are done entirely in the activation unit, in floating point. The proof-of-concept accelerator operates with both weights and activations stored on-chip).

Regarding claim 3, the rejection of claim 1 is incorporated and Drumond further discloses the activation values and the weights are stored in a block floating-point format; and (§4.1; the section discloses storing the weights in block floating point; and §5.1; the section discloses storing or converting activation values in BFP format)
the adjusted parameter comprises at least increasing a number of mantissa bits of the block floating-point format, increasing a number of exponent bits of the block floating-point format, or changing a sharing parameter of a shared exponent (§6, BFP Design Space”; the section discloses increasing a number of mantissa bits of the block-floating point format).

Regarding claim 4, the rejection of claim 1 is incorporated and Drumond further discloses update the activation values, the gradients, and/or node weights by increasing a number of bits used to stored mantissa values in the activation values the gradients, and/or node weights, respectively, producing updated activation values, updated gradients, and/or updated node weights; and store the updated activation values, updated gradients, and/or updated node weights in the memory (§6, BFP Design Space”; the section discloses increasing a number of bits used to store the mantissa values, producing updated weights, and storing this in memory).

Regarding claim 5, the rejection of claim 1 is incorporated and Drumond further discloses wherein the computing system determines the performance metric by: determining differences between an output of a layer of the neural network from an expected output; and based on the determining, adjusting the parameter by increasing a number of mantissa bits used to store the updated activation values, updated gradients, and/or updated node weights in the neural network having the adjusted parameter (§6; the section discloses determining differences between an output of a layer of the neural network from an expected output; and based on the determining, adjusting the parameter by increasing a number of mantissa bits used to store the updated activation values, updated gradients, and/or updated node weights in the neural network having the adjusted parameter. See “final validation error”).

Regarding claim 6, the rejection of claim 1 is incorporated and Drumond further discloses wherein the computing system is further configured to store values for the neural network having the adjusted parameter in a computer-readable storage device or media, the values including at least one of: updated activation values, updated gradients, or updated node weights generated after the adjusting the parameter (§4.2; “To minimize data loss in long-lasting training state, we store weights with wider mantissas. All operations are still executed using the original mantissa, and only weight updates use the wider mantissa. Therefore, we still reduce the memory bandwidth requirements for forward and backward passes, during which only the most significant bits of the weights are accessed. The least significant bits of the weight matrices are only accessed by weight updates”; and §5.3; “The proof-of-concept accelerator operates with both weights and activations stored on-chip”; and §6).

Regarding claim 10, the rejection of claim 8 is incorporated and Drumond further discloses measuring a training performance metric for the trained neural network, wherein the adjusting the precision parameter is based on the training performance metric (§6; the section discloses measuring a training performance metric for the trained NN, and adjusting the precision parameter based on this metric).

Regarding claim 11, the rejection of claims 8 and 10 are incorporated and Drumond further discloses wherein the training performance metric comprises at least one of the following: accuracy of the trained neural network; change in accuracy of the trained neural network over two or more of the training epochs; accuracy of at least one layer of the trained neural network; change in accuracy of at least one layer of the trained neural network over two or more training epochs; entropy of at least one layer of the trained neural network; or change in entropy of at least one layer of the trained neural network (§6; the section discloses measuring an accuracy using a validation error of the trained neural network).

Regarding claim 13, the rejection of claims 8 and 10 are incorporated and Drumond further discloses wherein the training performance metric is based on one or more of the following: mean square error of at least one layer of the trained neural network, perplexity of at least one layer of the trained neural network, gradient signal to noise ratio of at least one layer of the trained neural network, or entropy of the trained neural network (§6; the section discloses measuring a perplexity of at least one layer of the trained neural network, described as “Perplexity of language modeling models”).

Regarding claim 16, the rejection of claim 15 is incorporated and Drumond further discloses wherein the training performance metric comprises at least one of the following: accuracy of the neural network; change in accuracy of the neural network over two or more of training epochs of the neural network; accuracy of at least one layer of the neural network; change in accuracy of at least one layer of the neural network over two or more training epochs; entropy of at least one layer of the neural network; or change in entropy of at least one layer of the neural network (§6; the section discloses measuring an accuracy using a validation error of the trained neural network).

Regarding claim 17, the rejection of claim 15 is incorporated and Drumond further discloses wherein: the first floating point format is a block floating-point format; and (§4.1)
the adjusted precision parameter causes the computer system to increase a number of mantissa bits of the first floating point format so that at least some of the activation values and/or weights of the modified neural network are stored in a second floating point format having an increased number of mantissa bits (§6; the section discloses increase a number of mantissa bits of the first floating point format so that at least some of the activation values and/or weights of the modified neural network are stored in a second floating point format having an increased number of mantissa bits).

Regarding claim 18, the rejection of claims 15 and 17 are incorporated and Drumond further discloses wherein: the first floating point format stores at least one value with a mantissa having one, two, three, four, five, or size bits (§6, “BFP Design Space; the section discloses the mantissa having a 4 bit wide mantissa)
the second floating point format stores the at least one value with a mantissa having at least one more bits than the value in the first floating point format (§6; the section discloses the second floating point format stores the at least one value with a mantissa having at least one more bits than the value in the first floating point format, such as 8 or 12 bits).

Regarding claim 20, the rejection of claim 15 is incorporated and Drumond further discloses wherein the first floating point format is a block floating-point format (§4.1; “We propose to use BFP in all dot-product-based operations present in DNNs (i.e., convolutions, matrix multiplications, and outer products), and floating-point representations for all other operation”)
wherein the instructions further comprise: instructions that cause the computer system to, based on the adjusted precision parameter, modify the neural network to convert values from the first block floating-point format to a normal precision floating point format; and (§4.1; “We propose to use BFP in all dot-product-based operations present in DNNs (i.e., convolutions, matrix multiplications, and outer products), and floating-point representations for all other operation”; and §5.1; “In the forward pass, we convert the activations to BFP, giving the x tensor one exponent per training input. Then we execute the 5 target operation in native floating-point arithmetic”)
instructions that cause the computer system to forward propagate the values from the first layer of the neural network to the second layer of the neural network (§5.1; “In the forward pass, we convert the activations to BFP, giving the x tensor one exponent per training input. Then we execute the target operation in native floating-point arithmetic”, the forward pass, under a BRI, being the propagating of values between layers of a neural network).



Claims 7 and 12 are rejected under 35 U.S.C. § 103 as being obvious over Drumond in view of Shirahata and further in view of Xu et al. (US 20180157899 A1, hereinafter “Xu”).

Regarding claim 7, the rejection of claim 1 is incorporated and Drumond fails to explicitly disclose but Xu discloses wherein the performance metric is based on at least one of the following for a layer of the neural network: number of true positives, number of true negatives, number of false positives, or number of false negatives ([0087]; “In one example, an output of a convolutional neural network of each stage has a performance index, for example, a true positive rate (TPR) and a false positive rate (FPR). The TPR indicates a rate at which a true sample is accurately classified as the true sample, and the FPR indicates a rate at which a false sample is erroneously classified as the true sample” (emphasis added)).
Drumond, Shirahata, and Xu are analogous art because all are concerned with deep neural network computing.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in neural networks to combine the performance metric in the form of number of true positives, number of true negatives, number of false positives, or number of false negatives as taught by Xu with the computing system of Drumond and Shirahata to yield the predictable result of wherein the performance metric is based on at least one of the following for a layer of the neural network: number of true positives, number of true negatives, number of false positives, or number of false negatives. The motivation for doing so would be to indicate a rate at which a true sample is accurately classified as the true sample, and indicate a rate at which a false sample is erroneously classified as the true sample (Xu; [0087]).

Regarding claim 12, the rejection of claims 8 and 10 are incorporated and Drumond further discloses wherein the training performance metric is based on accuracy or change in accuracy of at least one layer of the trained neural network (§6; the section discloses measuring an accuracy using a validation error of the trained neural network).
Drumond fails to explicitly disclose but Xu discloses a true positive rate, a - 54 - true negative rate, a positive predictive rate, a negative predictive value, a false negative rate, a false positive rate, a false discovery rate, a false omission rate, or an accuracy rate ([0087]; “In one example, an output of a convolutional neural network of each stage has a performance index, for example, a true positive rate (TPR) and a false positive rate (FPR). The TPR indicates a rate at which a true sample is accurately classified as the true sample, and the FPR indicates a rate at which a false sample is erroneously classified as the true sample” (emphasis added)).
The motivation to combine Drumond, Shirahata, and Xu is the same as discussed above with respect to claim 7.


Claim 9 is rejected under 35 U.S.C. § 103 as being obvious over Drumond in view of Shirahata and further in view of Shattil (US 20190386717 A1, hereinafter “Shattil”).

Regarding claim 9, the rejection of claim 1 is incorporated and Drumond fails to explicitly disclose but Shattil discloses wherein the precision parameter is adjusted according to a predetermined schedule ([0017]; “some aspects, a predetermined schedule of parameters to be updated in each iteration is provided”).
Drumond, Shirahata, and Shattil are analogous art because all are concerned with intelligent data analytics.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in intelligent data analytics to combine the predetermined schedule of Shattil with the computing method of Drumond and Shirahata to yield the predictable result of wherein the precision parameter is adjusted according to a predetermined schedule. The motivation for doing so would be to provide for data-independent updating schedules (Shattil; [0017]).

Claim 14 is rejected under 35 U.S.C. § 103 as being obvious over Drumond in view of Shirahata and further in view of Chou et al. (US 20190075301 A1, hereinafter “Chou”).

Regarding claim 14, the rejection of claim 1 is incorporated and Drumond fails to explicitly disclose but Chou discloses wherein the precision parameter is for a network topology parameter, and wherein the adjusting the precision parameter comprises at least one of the following: adjusting a number of layers of the trained neural network, adjusting a number of nodes of a layer of the trained neural network, adjusting sparsity of edges of a layer of the trained neural network or adjusting a number of non-zero edges of a layer of the trained neural network ([0085]; “In particular, the machine learning parameters 64 of the convolutional neural network block 34A may be adjusted by adjusting the number of convolution layers 66, associated convolution weights 68, and/or configuration (e.g., number and/or interconnected nodes) of the layer interconnections 71e” (emphasis added)).
Drumond, Shirahata, and Chou are analogous art because all are concerned with deep neural network computing.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in neural networks to combine the adjusting of a number of layers as taught by Chou with the computing method of Drumond and Shirahata to yield the predictable result of wherein the precision parameter is for a network topology parameter, and wherein the adjusting the precision parameter comprises at least one of the following: adjusting a number of layers of the trained neural network, adjusting a number of nodes of a layer of the trained neural network, adjusting sparsity of edges of a layer of the trained neural network or adjusting a number of non-zero edges of a layer of the trained neural network. The motivation for doing so would be to facilitate improving encoding efficiency using a neural network for image processing (Chou; Abstract).

Claim 19 is rejected under 35 U.S.C. § 103 as being obvious over Drumond in view of Shirahata and further in view of Srinivasan (US 20070258641 A1, hereinafter “Srinivasan”).

Regarding claim 19, the rejection of claims 15 and 17 are incorporated and Drumond fails to explicitly disclose but Srinivasan discloses wherein the increased number of bits of mantissas in the second floating point format is selected using a rate distortion function ([0010]; “The mantissa can be cast to a second bit length (which can be specified by a user) that differs from the first bit length. The second bit length of the mantissa can be adjusted for rate control or to improve rate-distortion performance” (emphasis added)).
Drumond, Shirahata, and Srinivasan are analogous art because all are concerned with intelligent data analytics.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in intelligent data analytics to combine the rate distortion of Srinivasan with the computer-readable storage devices of Drumond and Shirahata to yield the predictable result of wherein the increased number of bits of mantissas in the second floating point format is selected using a rate distortion function. The motivation for doing so would be to improve rate-distortion performance (Srinivasan; [0010]).

Response to Arguments

Applicant’s arguments and amendments, filed on 6/4/2022, with respect to the 35 USC § 112(b) rejection of claims 1-7 have been fully considered and are persuasive.  The 35 USC § 112(b) rejection of claims 1-20 has been withdrawn.

Applicant’s arguments and amendments, filed on 6/4/2022, with respect to the 35 USC § 101 rejection of claims 15-20 have been fully considered and are not persuasive.  

On page 8 of the Remarks, filed on 6/4/2022, Applicant argues that “the storage devices or media explicitly recite excluding transmission media and modulated data signals”.  Examiner respectfully believes that claims 15-20, as amended and under a broadest reasonable interpretation of the claim language, may still be directed towards signals per se.  

Looking to the originally filed specification, paragraph [0145] discloses “[a]s should be readily understood, the term computer-readable storage media includes the media for data storage such as memory 1320 and storage 1340, and not transmission media such as modulated data signals” (emphasis added). Under a broadest reasonable interpretation of the claim language, the “computer readable storage devices or media” of claim 15 may also encompass transitory signals (such as non-modulated data signals), because transitory forms of signal communication are not explicitly excluded (especially in view of the “such as” language of paragraph [0145]), and is thus directed towards signals per se. Further, claim 15, as amended, recites “One or more computer-readable storage devices or media not including transmission media and modulated data signals, the computer-readable storage devices or media”; this phrase does not exclude non-modulated data signals and the specification fails to define exactly what constitutes transmission media.  For these reasons, Applicant’s arguments are not persuasive, and the 35 USC § 101 rejection of claims 15-20 STANDS.

Applicant’s arguments and amendments, filed on 6/4/2022, with respect to the 35 USC § 102(a)(1) rejection of claims 1-6, 8, 10, 11, 13, 15-18, and 20 and 35 USC § 103 rejection of claims 7, 9, 12, 14, and 19 have been considered but are moot because the arguments do not apply to any of the references being used in the current rejection to reject independent claims 1, 8, and 15.  Drumond and Shirahata are now being used to render claims 1, 8, and 15 obvious under 35 USC § 103.


Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Lee et al. (US 20180341857 A1).

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Brent Hoover whose telephone number is (303)297-4403. The examiner can normally be reached Monday - Friday 9-5 MST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on 571-270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRENT JOHNSTON HOOVER/Examiner, Art Unit 2127