Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 09/02/2022 has been entered.
 
Amendments
Claims 1, 4, 10, and 18 are currently amended. Claims 1-7 and 9-21 are pending and have been considered.

Claim Objections
Claims 1, 10, and 18 are objected to because of the following informalities:  In claim 1, line 9, the phrase “the effect of pruning the edge approximated” is grammatically incomplete. The phrase should recite “wherein the effect of pruning the edge is approximated” or likewise.  Claim 10, line 11 recites the same phrase as claim 1, line 9.  
Claim 18 does not consistently use an article before “loss.” Claim 18 recites “loss” on page 6 in each of lines 4, 5, 7, and 10. The claim should consistently use (or not use) an article before “loss.” Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 18-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
	In Claim 18 on p. 6 of the claims, line 2 recites “the respective weight,” line 6 recites “a respective edge,” and line 8 recites “the respective edge”. It is unclear if the edge mentioned in lines 6 and 8 correspond to the weight mentioned in line 2. It is unclear what “respective” is relative to in lines 6 and 8. Claim 19, line 2 and Claim 20, line 2 each recites “the estimated change in utility of each edge.” However, claim 18 only recites “an estimated change in utility of a respective edge.” It is unclear if “each” means “respective”. The term each means one of a plurality of items. For purposes of examination, Examiner interprets “respective edge” in claim 18 on p. 6, lines 6 and 8 and Examiner interprets “each edge” in claims 18 and 19 to mean an edge.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-7 and 9-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

CLAIM 1
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
for at least one edge of the plurality of edges: determining, by the one or more computing devices, an estimated utility of the edge, wherein the estimated utility is determined by approximating an effect of pruning the edge on network performance, the effect of pruning the edge approximated based on an approximate change in a loss for the machine-learned neural network due to a proposed change in a weight associated with the edge; (Determining an estimated utility of the edge and approximating an effect of pruning the edge are evaluation and judgement mental processes which can reasonably be performed in one’s mind with the aid of pencil and paper. Approximating a change in a loss is a mathematical calculation.)
determining, by the one or more computing devices, to prune the edge based at least in part on the estimated utility of such edge; (Determining to prune an edge is an evaluation and judgement mental process which can reasonably be performed in one’s mind with the aid of pencil and paper.)
generating, by the one or more computing devices and based on the determining to prune at least one edge of the plurality of edges, a sparse representation of the machine-learned neural network, the sparse representation providing a reduced size for the machine-learned neural network. (A neural network is a type of mathematical model. Generating a sparse representation of a mathematical model is a mathematical concept.)
The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites
the following additional elements: obtaining, by one or more computing devices, data descriptive of a machine-learned neural network, wherein the machine-learned neural network comprises a plurality of neurons respectively connected by a plurality of edges, and wherein a plurality of weights are respectively associated with the plurality of edges.
	Obtaining data amounts to mere data-gathering, an insignificant extra-solution activity, as discussed in MPEP 2106.05(g). Computing devices are generic computer components performing generic functions, as discussed in MPEP 2106.05(f). A machine-learned neural network, wherein the machine-learned neural network comprises a plurality of neurons respectively connected by a plurality of edges, and wherein a plurality of weights are respectively associated with the plurality of edges is generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception(s). Obtaining data amounts to mere data-gathering, an insignificant extra-solution activity, as discussed in MPEP 2106.05(g). Obtaining data is well-understood, routine, conventional activity of receiving data over a network, as discussed in MPEP 2106.05(d)(II), example (i). Computing devices are generic computer components performing generic functions, as discussed in MPEP 2106.05(f).  A machine-learned neural network, wherein the machine-learned neural network comprises a plurality of neurons respectively connected by a plurality of edges, and wherein a plurality of weights are respectively associated with the plurality of edges is generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). The claim is not patent eligible.

CLAIM 2 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim incorporates the judicial exceptions of claim 1. The claim recites: determining, by the one or more computing devices, the estimated utility of the edge comprises determining, by the one or more computing devices, the estimated utility of the edge based at least in part on an approximation of a loss function at the weight associated with the edge. Determining the estimated utility of the edge is an evaluation and judgement mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. Approximating a loss function is a mathematical concept. The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim does not recite any additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception(s). The claim is not patent eligible.

CLAIM 3 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim incorporates the judicial exceptions of claim 1. The claim recites:
wherein determining, by the one or more computing devices, the estimated utility of the edge comprises determining, by the one or more computing devices, a first derivative of a loss function with respect to a logit of… , but not determining, by the one or more computing devices, any higher-order derivatives of the loss function. This limitation is a mathematical calculation of determining a first derivative of a loss function with respect to a logit. The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements: 
a receiving neuron at the weight associated with the edge
A receiving neuron at the weight associated with the edge is generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception(s). A receiving neuron at the weight associated with the edge is generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). The claim is not patent eligible.

CLAIM 4 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim incorporates the judicial exceptions of claim 1. The claim recites: determining, by the one or more computing devices, the estimated utility of the edge comprises determining, by the one or more computing devices, a sum over one or more training examples included in a training dataset of the proposed change in the weight multiplied by an output… multiplied by a first derivative of a loss function with respect to a logit … . This limitation is mathematical calculation. The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements: 
of a transmitting neuron
of a receiving neuron at the weight and training example.
A transmitting neuron and a receiving neuron are generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception(s). A transmitting neuron and a receiving neuron are generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). The claim is not patent eligible.

CLAIM 5 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim incorporates the judicial exceptions of claim 1. The claim recites: wherein generating the sparse representation of the machine-learned neural network comprises generating, by the one or more computing devices, a sparse weight matrix. This limitation is a mathematical concept. The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the additional element of computing devices. Computing devices are generic computer components performing generic functions, as discussed in MPEP 2106.05(f). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception(s). Computing devices are generic computer components performing generic functions, as discussed in MPEP 2106.05(f). The claim is not patent eligible.

CLAIM 6 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim incorporates the judicial exceptions of claim 1. The claim recites: supplementing the machine-learned neural network with at least one additional edge at a different location within the machine-learned neural network. Since a neural network is a type of mathematical model, this limitation amounts to a mathematical concept of activating a different weight value in the mathematical model. It is also an evaluation and judgement mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the additional elements of computing devices and a machine-learned neural network. Computing devices are generic computer components performing generic functions, as discussed in MPEP 2106.05(f). A machine-learned neural network is generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception(s). Computing devices are generic computer components performing generic functions, as discussed in MPEP 2106.05(f). A machine-learned neural network is generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). The claim is not patent eligible.

CLAIM 7 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim incorporates the judicial exceptions of claim 1. The claim recites: prevents any other edges that connect to a same neuron as the edge from being modified in one or more pruning iterations. This limitation is an evaluation or judgement mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the additional element of storing, by the one or more computing devices, a data item. Computing devices are generic computer components performing generic functions, as discussed in MPEP 2106.05(f). Storing a data is mere data-gathering, an insignificant extra-solution activity, as discussed in MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception(s). Computing devices are generic computer components performing generic functions, as discussed in MPEP 2106.05(f). Storing a data is mere data-gathering, an insignificant extra-solution activity, as discussed in MPEP 2106.05(g). Storing a data is well-understood, routine, conventional activity of storing information in memory, as discussed in MPEP 2106.05(d)(II), example (iv). The claim is not patent eligible.

CLAIM 9 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim incorporates the judicial exceptions of claim 1. The claim recites the following limitations: 
adding, by the one or more computing devices, a patch subnetwork (Mathematical concept)
predict an error associated with its input. (Evaluation and judgement mental process which can reasonable be performed in one’s mind with the aid of pencil and paper.)
	The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the addition elements of:  
(of) the machine-learned neural network, 
wherein the patch subnetwork is trained (to)
A machine-learned neural network is generally linking the abstract ideas to the technological environment of machine learning, as discussed in MPEP 2106.05(h). Training a network is well-known, which is insignificant extra-solution activity as discussed in MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception(s). A machine-learned neural network is generally linking the abstract ideas to the technological environment of machine learning, as discussed in MPEP 2106.05(h). Training a network is well-known, which is insignificant extra-solution activity as discussed in MPEP 2106.05(g). Training a network is well-understood, routine, conventional activity according to MPEP 2106.05(d)(I)(2). Baum et al. (US 20180285736 A1) provides Berkheimer evidence in ¶ [0010]: “A neural network can be trained using backpropagation which is a method to calculate the gradient of the loss function with respect to the weights in an ANN. The weight updates of backpropagation can be done via well-known stochastic gradient descent techniques.” The claim is not patent eligible.

CLAIM 10
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations: 
determining a respective estimated utility of each of the plurality of edges, wherein the respective estimated utility is determined by approximating an effect of pruning the edge on network performance, the effect of pruning the edge approximated based on an approximate change in a loss for the machine-learned neural network due to a proposed change in a weight associated with the edge; (Determining an estimated utility of an edge and approximating an effect of pruning the edge are evaluation and judgement mental processes which can reasonably be performed in one’s mind with the aid of pencil and paper. Approximating a change in a loss is a mathematical calculation.)
selecting one or more edges for deletion based at least in part on the respective estimated utility of each of the plurality of edges; and (Selecting an edge for deletion is an evaluation and judgement mental process which can reasonably be performed in one’s mind with the aid of pencil and paper.)
compressing the machine-learned neural network by deleting the selected one or more edges. (A neural network is a mathematical model. Compressing a mathematical model is a mathematical concept, where a neural network is a type of mathematical model.)
The claim recites an abstract idea.
Step 2A Prong 2: The claim recites the following additional elements: 
one or more processors
one or more non-transitory computer-readable media that collectively store instructions that, when executed by the one or more processors, cause the computer system to perform operations, the operations comprising: 
obtaining data descriptive of a machine-learned neural network, wherein the machine-learned neural network comprises a plurality of neurons respectively connected by a plurality of edges;
Processors and non-transitory computer-readable media storing instructions executed by the processors are generic computer components performing generic functions, as discussed in MPEP 2106.05(f). Obtaining data amounts to mere data-gathering, an insignificant extra-solution activity, as discussed in MPEP 2106.05(g). A machine-learned neural network comprising a plurality of neurons respectively connected by a plurality of edge is generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception(s). Processors and non-transitory computer-readable media storing instructions executed by the processors are generic computer components performing generic functions, as discussed in MPEP 2106.05(f). Obtaining data amounts to mere data-gathering, an insignificant extra-solution activity, as discussed in MPEP 2106.05(g). Obtaining data is well-understood, routine, conventional activity of receiving data over a network, as discussed in MPEP 2106.05(d)(II), example (i). A machine-learned neural network comprising a plurality of neurons respectively connected by a plurality of edge is generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). The claim is not patent eligible.

CLAIM 11 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim incorporates the judicial exceptions of claim 10. The claim recites: determining the respective estimated utility of each of the plurality of edges comprises determining the respective estimated utility of each of the plurality of edges based at least in part on an approximation of a loss function at a weight associated with the edge. Determining the respective estimated utility of each edge is an evaluation and judgement mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. Approximating a loss function is a mathematical concept. The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim does not recite any additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception(s). The claim is not patent eligible.

CLAIM 12 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim incorporates the judicial exceptions of claim 10. The claim recites: determining the respective estimated utility of each of the plurality of edges comprises determining, for each edge, a first derivative of a loss function with respect to a logit… at a weight associated with the edge without determining any higher-order derivatives of the loss function. This limitation is a mathematical calculation of determining a first derivative of a loss function with respect to a logit. The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the additional element of a receiving neuron. This additional element is generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception(s). A receiving neuron is generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). The claim is not patent eligible.

CLAIM 13 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim incorporates the judicial exceptions of claim 10. The claim recites: determining the respective estimated utility of each of the plurality of edges comprises determining, for each edge, a sum over one or more training examples included in a training dataset of a negative weight multiplied by an output of a transmitting neuron multiplied by a first derivative of a loss function with respect to a logit. This limitation is mathematical calculation. The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the additional element of a receiving neuron at the weight and training example. This additional element is generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception(s). A receiving neuron is generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). The claim is not patent eligible.

CLAIM 14 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim incorporates the judicial exceptions of claim 10. The claim recites: selecting one or more edges for deletion based at least in part on the respective estimated utility of each of the plurality of edges comprises selecting a predetermined number of the plurality of edges that have the lowest estimated utilities. Selecting edges for deletion is an evaluation and judgement mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. The claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim does not recite any additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception(s). The claim is not patent eligible.

CLAIM 15 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim incorporates the judicial exceptions of claim 10. The claim recites: selecting one or more edges for deletion based at least in part on the respective estimated utility of each of the plurality of edges comprises selecting a predetermined percentage of the plurality of edges that have the lowest estimated utilities. Selecting edges for deletion is an evaluation and judgement mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. The claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim does not recite any additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception(s). The claim is not patent eligible.

CLAIM 16 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim incorporates the judicial exceptions of claim 10. The claim recites: adding one or more new edges. This limitation is a mathematical concept of adding edges to a mathematical model. The claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the additional element of the machine-learned neural network, which is generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception(s). A machine-learned neural network is generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). The claim is not patent eligible.

CLAIM 17 incorporates the rejection of claim 16.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim incorporates the judicial exceptions of claim 16. The claim recites: Adding one or more new edges to the machine-learned neural network comprises adding a same number of new edges to the machine-learned neural network as was deleted from the machine-learned neural network. Since a neural network is a mathematical model, this limitation is a mathematical concept of adding edges to a mathematical model. The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the additional element of the machine-learned neural network, which is generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception(s). A machine-learned neural network is generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). The claim is not patent eligible.

CLAIM 18 
Step 1: The claim recites a product, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
determining a plurality of different proposed quantization schemes, each proposed quantization scheme including changes to the respective weight of one or more edges to be quantized under such scheme; (Determining is an evaluation and judgment mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. A quantization scheme is a mathematical concept.)
estimating a change in loss for each of the plurality of different proposed quantization schemes, wherein estimating the change in loss for each proposed quantization scheme comprises determining an estimated change in utility of a respective edge to be quantized, the estimated change in utility being determined based on a derivative of the loss at the respective edge and a proposed change in a weight of the respective edge; (Estimating a change in loss and determining an estimated change in utility based on a derivative of a loss and a proposed change in a weight are mathematical calculations.)
selecting one of the proposed quantization schemes based at least in part on the estimated changes in loss; and (Evaluation and judgement mental processes which can be reasonably performed in one’s mind with the aid of pencil and paper.)
compressing the machine-learned neural network by applying the selected one of the proposed quantization schemes to the machine-learned neural network, wherein applying the selected quantization scheme comprises changing the respective weight of the one or more edges to be quantized under such scheme. (A neural network is a mathematical model. Compressing a neural network by applying a quantization scheme to a mathematical model.)
The claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the additional elements of:
One or more non-transitory computer-readable media that collectively store instructions that, when executed by one or more processors, cause the one or more processors to perform operations, the operations comprising:
obtaining data descriptive of a machine-learned neural network, wherein the machine-learned neural network comprises a plurality of neurons respectively connected by a plurality of edges, and wherein a plurality of weights are respectively associated with the plurality of edges;
Processors and non-transitory computer-readable media storing instructions executed by the processors are generic computer components performing generic functions, as discussed in MPEP 2106.05(f). Obtaining data amounts to mere data-gathering, an insignificant extra-solution activity, as discussed in MPEP 2106.05(g). A machine-learned neural network comprising a plurality of neurons respectively connected by a plurality of edge is generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception(s). Processors and non-transitory computer-readable media storing instructions executed by the processors are generic computer components performing generic functions, as discussed in MPEP 2106.05(f). Obtaining data amounts to mere data-gathering, an insignificant extra-solution activity, as discussed in MPEP 2106.05(g). Obtaining data is well-understood, routine, conventional activity of receiving data over a network, as discussed in MPEP 2106.05(d)(II), example (i). A machine-learned neural network comprising a plurality of neurons respectively connected by a plurality of edge is generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). The claim is not patent eligible.

CLAIM 19 incorporates the rejection of claim 18.
Step 1: The claim recites a product, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim incorporates the judicial exceptions of claim 16. The claim recites: determining the estimated change in utility of each edge to be quantized comprises determining the estimated change in utility of each edge to be quantized based at least in part on a first-order approximation of a loss function at the weight associated with the edge. Determining the estimated change in utility based on a first-order approximation of a loss function is a mathematical calculation. The claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim does not recite any additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception(s). The claim is not patent eligible.

CLAIM 20 incorporates the rejection of claim 18.
Step 1: The claim recites a product, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim incorporates the judicial exceptions of claim 18. The claim recites: wherein determining the estimated change in utility of each edge to be quantized comprises determining, for each edge to be quantized, a first derivative of a loss function with respect to a logit… without determining any higher-order derivatives of the loss function. Determining a first derivative of a loss function is a mathematical calculation. The claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the additional element of: of a receiving neuron at the weight associated with the edge. A receiving neuron is generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception(s). A receiving neuron is generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). The claim is not patent eligible.

CLAIM 21 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim incorporates the judicial exceptions of claim 10. The claim recites: deleting the selected one or more edges comprises generating a sparse representation of… . Generating a sparse representation of a mathematical model is a mathematical concept. The claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the additional element of: the machine-learned neural network. A machine-learned neural network is generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception(s). A machine-learned neural network is generally linking the judicial exceptions to the technological environment of machine learning, as discussed in MPEP 2106.05(h). The claim is not patent eligible.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-2, 5, 7, 10-11, and 21 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Molchanov et al. (“Pruning Convolutional Neural Networks For Resource Efficient Inference”).

	Regarding CLAIM 1, Molchanov teaches: A computer-implemented method, comprising: 
obtaining, by one or more computing devices, data descriptive of a machine-learned neural network, wherein the machine-learned neural network comprises a plurality of neurons respectively connected by a plurality of edges, and wherein a plurality of weights are respectively associated with the plurality of edges; (P. 2, § 2, line 13 teaches the network’s parameters include weights and biases. P. 3, lines 1-11 further teaches connections between convolutional layers of a convolutional neural network. In Fig. 1 on p. 2, the first step “Network” implies obtaining information about the network for the pruning procedure. Section 3 on pp. 6-10 teaches experimental results for at least VGG-16 network, AlexNet, and CaffeNet neural networks.) 
for at least one edge of the plurality of edges: determining, by the one or more computing devices, an estimated utility of the edge, wherein the estimated utility is determined by approximating an effect of pruning the edge on network performance, the effect of pruning the edge approximated based on an approximate change in a loss for the machine-learned neural network due to a proposed change in a weight associated with the edge; and determining, by the one or more computing devices, to prune the edge based at least in part on the estimated utility of such edge; and (The following claim mappings teach both limitations of determining an estimated utility of the edge and determining to prune the edge based on the estimated utility. Abstract, lines 1-6 broadly teaches the limitations. On p. 2, the last paragraph teaches: “Starting with a full set of parameters W, we iteratively identify and remove the least important parameters, as illustrated in Figure 1.” The second step in Figure 1 is “Evaluating importance of neurons”, the third step is “Remove the least important neuron”, and the last step optionally returns to the second step. The footnote on p. 2 teaches: “A ‘parameter’                         
                            
                                
                                    w
                                    ,
                                    b
                                
                            
                            ∈
                            W
                        
                     might represent an individual weight”. On p. 3, § 2.1, the first paragraph teaches: “Minimizing the difference in accuracy between the full and pruned models depends on the criterion for identifying the ‘least important’ parameters, called saliency, at each step. The best criterion would be an exact empirical evaluation of each parameter, which we denote the oracle criterion, accomplished by ablating each non-zero parameter                         
                            w
                            ∈
                            
                                
                                    W
                                
                                
                                    '
                                
                            
                        
                     in turn and recording the cost’s difference.” On p. 3, § 2.2 teaches: “There are many heuristic criteria which are much more computationally efficient than the oracle… We describe these criteria in the following paragraphs and propose a new criterion which is based on the Taylor expansion.” On p. 4, the entire subsection titled “Taylor expansion” teaches a method of “directly approximate change in the loss function from removing a particular parameter” as discussed in line 3. Equations 3 shows the change in a loss, and the line above equation 4 states “To approximate                         
                            ∆
                            C
                            (
                            
                                
                                    h
                                
                                
                                    i
                                
                            
                            )
                        
                    , we use the first-degree Taylor polynomial.” The line below equation 7 teaches: “Intuitively, this criterion prunes parameters that have an almost flat gradient of the cost function w.r.t. feature map                         
                            
                                
                                    h
                                
                                
                                    i
                                
                            
                        
                    .”)
generating, by the one or more computing devices and based on the determining to prune at least one edge of the plurality of edges, a sparse representation of the machine-learned neural network, the sparse representation providing a reduced size for the machine-learned neural network. (P. 2, § 2, lines 1-6 teaches: “Stop pruning after reaching the target trade-off between accuracy and pruning objective, e.g. floating point operations (FLOPs) or memory utilization.” P. 5, § 2.4 teaches: “One of the main reasons to apply pruning is to reduce number of operations in the network. Feature maps from different layers require different amounts of computation due the number and sizes of input feature maps and convolution kernel… Other regularization conditions may be applied, e.g. storage size, kernel sizes, or memory footprint.” Experimental results for providing a reduced size are disclosed on p. 8, from the paragraph starting “Fig. 4 shows” until the start of § 3.4, and by the “Taylor” pruning methods illustrated in Figures 4 and 5. A neural network becomes sparse when an edge weight is pruned/set to zero value. On p. 4, the sentence containing equation 3 teaches setting a pruned weight to zero.)

	Regarding CLAIM 2, Molchanov teaches: The computer-implemented method of claim 1, wherein determining, by the one or more computing devices, the estimated utility of the edge comprises determining, by the one or more computing devices, the estimated utility of the edge based at least in part on an approximation of a loss function at the weight associated with the edge. (P. 4, subsection “Taylor expansion”, from lines 1 to line 12 which recites “To approximate                         
                            ∆
                            C
                            (
                            
                                
                                    h
                                
                                
                                    i
                                
                            
                            )
                        
                    , we use the first-degree Taylor polynomial.”)

Regarding CLAIM 5, Molchanov teaches: The computer-implemented method of claim 1, wherein generating the sparse representation of the machine-learned neural network comprises generating, by the one or more computing devices, a sparse weight matrix. (A neural network becomes sparse when an edge weight is pruned/set to zero value. On p. 4, the sentence containing equation 3 teaches setting a pruned weight to zero. A weight matrix is indicated by the bold formatting and the indices of the weights at p. 2, § 2, line 13. The notation is explained at p. 3, from line 1 to § 2.1.)

Regarding CLAIM 7, Molchanov teaches: The computer-implemented method of claim 1, further comprising: storing, by the one or more computing devices, a data item that prevents any other edges that connect to a same neuron as the edge from being modified in one or more pruning iterations. (The broadest reasonable interpretation of this claim is that stopping pruning prevents any other edges from being modified. See p. 2, § 2, lines 1-6; p. 2, Fig. 1, last step illustratively shows stopping pruning.)

	Claims 10-11 are each directed to a system that contains the same features as the method of claims 1-2 and are therefore rejected for at least the same reasons therein. Additionally, claim 10 recites a computer system comprising: one or more processors; and one or more non-transitory computer-readable media that collectively store instructions that, when executed by the one or more processors, cause the computer system to perform operations… ; and compressing the machine-learned neural network by deleting the selected one or more edges. Molchanov explicitly teaches processors by the CPUs and GPUs listed in Table 2, column 1. The experimental results throughout § 3 on pp. 6-10 are evidence of non-transitory computer-readable media. Molchanov explicitly teaches compressing the neural network by deleting edges. P. 2, § 2, lines 1-6 teaches: “Stop pruning after reaching the target trade-off between accuracy and pruning objective, e.g. floating point operations (FLOPs) or memory utilization.” P. 5, § 2.4 teaches: “One of the main reasons to apply pruning is to reduce number of operations in the network. Feature maps from different layers require different amounts of computation due the number and sizes of input feature maps and convolution kernel… Other regularization conditions may be applied, e.g. storage size, kernel sizes, or memory footprint.” Experimental results for compressing the network are disclosed on p. 8, from the paragraph starting “Fig. 4 shows” until the start of § 3.4, and by the “Taylor” pruning methods illustrated in Figures 4 and 5.

	Regarding CLAIM 21, Molchanov teaches: The computing system of claim 10, wherein deleting the selected one or more edges comprises generating a sparse representation of the machine-learned neural network. (A neural network becomes sparse when an edge weight is pruned/set to zero value. On p. 4, the sentence containing equation 3 teaches setting a pruned weight to zero.)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 3 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Molchanov et al. (“Pruning Convolutional Neural Networks For Resource Efficient Inference”) in view of Sadowski (“Notes on Backpropagation”, cited in the PTO-892 filed 02/18/2022). 

Regarding CLAIM 3, Molchanov teaches: The computer-implemented method of claim 1, wherein determining, by the one or more computing devices, the estimated utility of the edge comprises determining, by the one or more computing devices, a first derivative of a loss function  (Crossed-out text is not explicitly taught by the reference. On p. 4, equation 7 teaches the claim limitation. Equation 7 contains a first derivative of the cost function with respect to                         
                            
                                
                                    h
                                
                                
                                    i
                                
                            
                        
                    , where                         
                            
                                
                                    h
                                
                                
                                    i
                                
                            
                        
                     is the output produced from parameter                         
                            i
                        
                     according to line 4 of the subsection “Taylor expansion.” The subsection explicitly teaches using a first-order Taylor expansion and disregarding higher-order derivatives below equation 6.)
Molchanov teaches, below equation 7: “This approach requires accumulation of the product of the activation and the gradient of the cost function w.r.t. to the activation”. Molchanov teaches a first derivative of a loss function with respect to an activation of a receiving neuron, but Molcahnov does not explicitly teach determining a first derivative of a loss function with respect to a logit with respect to a logit of a receiving neuron
	But Sadowski teaches: determining a first derivative of a loss function with respect to a logit of a receiving neuron (P. 2, equation (11) teaches a first derivative of an error with respect to a logit                         
                            
                                
                                    s
                                
                                
                                    j
                                
                            
                        
                    . The paragraph above equation (11) and p. 1 from the start through equation (3) explains the variables and indices.)
	Sadowski is in the same field of endeavor as the claimed invention, namely, machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have performed backpropagation on Molchanov’s neural network, with a motivation to reduce the error of the network. (Sadowski, Abstract). 

	Claim 12 is directed to a system that contains the same features as the method of claim 3 and is therefore rejected for at least the same reasons therein.  

Claims 4 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Molchanov et al. (“Pruning Convolutional Neural Networks For Resource Efficient Inference”) in view of Ng (“Sparse autoencoder”, cited in the PTO-892 filed 02/18/2022).

	Regarding CLAIM 4, Molchanov teaches: The computer-implemented method of claim 1, wherein determining, by the one or more computing devices, the estimated utility of the edge comprises determining, by the one or more computing devices, a sum over one or more training examples included in a training dataset of the proposed change in the weight                         
                            
                                
                                    z
                                
                                
                                    l
                                    ,
                                    m
                                
                                
                                    
                                        
                                            k
                                        
                                    
                                
                            
                        
                     and a first derivative of a loss function with respect to an activation is the partial derivative.)
However, Molchanov does not explicitly teach, taken as a whole: determining a sum over one or more training examples included in a training dataset of the proposed change in the weight multiplied by an output of a transmitting neuron multiplied by a first derivative of a loss function with respect to a logit of a receiving neuron at the weight and training example.
But Ng teaches: determining a sum over one or more training examples included in a training dataset of the proposed change in the weight multiplied by an output of a transmitting neuron multiplied by a first derivative of a loss function with respect to a logit of a receiving neuron at the weight and training example. (A sum over training examples is taught by equation                         
                            
                                
                                    ∂
                                
                                
                                    ∂
                                    
                                        
                                            W
                                        
                                        
                                            i
                                            j
                                        
                                        
                                            
                                                
                                                    l
                                                
                                            
                                        
                                    
                                
                            
                            J
                            
                                
                                    W
                                    ,
                                    b
                                
                            
                        
                     on p. 7 at the bottom, where each pair (x,y) form a training example. A proposed change in the weight is the learning rate                         
                            α
                        
                     in the update formula for                         
                            
                                
                                    W
                                
                                
                                    i
                                    j
                                
                                
                                    
                                        
                                            l
                                        
                                    
                                
                            
                        
                     on p. 7. An output of a transmitting neuron is                         
                            
                                
                                    a
                                
                                
                                    j
                                
                                
                                    
                                        
                                            l
                                        
                                    
                                
                            
                        
                     in step 4 on p. 8. A first derivative of a loss function with respect to a logit is                         
                            
                                
                                    δ
                                
                                
                                    i
                                
                                
                                    
                                        
                                            
                                                
                                                    n
                                                
                                                
                                                    l
                                                
                                            
                                        
                                    
                                
                            
                        
                     in step 2 on p. 8, where z is a logit, as discussed on p. 4 below equation (5).)
	Ng is in the same field of endeavor as the claimed invention, namely, machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have performed backpropagation on Molchanov’s neural network, with a motivation to train the network. (Ng, top of p. 6)

	Claim 13 is directed to a system that contains the same features as the method of claim 4 and is therefore rejected for at least the same reasons therein.  

Claims 6 and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Molchanov et al. (“Pruning Convolutional Neural Networks For Resource Efficient Inference”) in view of Bellec et al. (“Deep Rewiring: Training very sparse deep networks”, cited in the PTO-892 filed 02/18/2022). 

Regarding CLAIM 6, Molchanov teaches: The computer-implemented method of claim 1, 
However, Molchanov does not explicitly teach: further comprising: supplementing, by the one or more computing devices, the machine-learned neural network with at least one additional edge at a different location within the machine-learned neural network.
	But Bellec teaches: further comprising: supplementing, by the one or more computing devices, the machine-learned neural network with at least one additional edge at a different location within the machine-learned neural network. (P. 3, Algorithm 1, line 7; and p. 4, in the paragraph beginning with “The rewiring aspect”, lines 3-8.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated Bellec’s step of activating a dormant/pruned connection in Molchanov’s network. A motivation for the combination is to ensure that the same number of connections with the network was initialized are active at any time during training. (Bellec, p. 4, in the paragraph beginning with “The rewiring aspect”: “For each connection that was set to the dormant state, a new connection                         
                            
                                
                                    k
                                
                                
                                    '
                                
                            
                        
                     is chosen randomly from the uniform distribution over dormant connections,                         
                            
                                
                                    k
                                
                                
                                    '
                                
                            
                        
                     is activated and its parameter is initialized to 0. This rewiring strategy (a) ensures that exactly                         
                            K
                        
                     connections are active at any time during training (one initializes the network with                         
                            K
                        
                     active connections)”)

Regarding CLAIM 16, Molchanov teaches: The computing system of claim 10, 
However, Molchanov does not explicitly teach: wherein the operations further comprise adding one or more new edges to the machine-learned neural network.
But Bellec teaches: wherein the operations further comprise adding one or more new edges to the machine-learned neural network. (P. 3, Algorithm 1, lines 6-9 and P. 4, in the paragraph beginning with “The rewiring”, lines 3-8.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated Bellec’s step of activating a dormant/pruned connection in Molchanov’s network. A motivation for the combination is to ensure that the same number of connections with the network was initialized are active at any time during training. (Bellec, p. 4, in the paragraph beginning with “The rewiring aspect”: “For each connection that was set to the dormant state, a new connection                         
                            
                                
                                    k
                                
                                
                                    '
                                
                            
                        
                     is chosen randomly from the uniform distribution over dormant connections,                         
                            
                                
                                    k
                                
                                
                                    '
                                
                            
                        
                     is activated and its parameter is initialized to 0. This rewiring strategy (a) ensures that exactly                         
                            K
                        
                     connections are active at any time during training (one initializes the network with                         
                            K
                        
                     active connections)”)

Regarding CLAIM 17, the combination of Molchanov and Bellec teaches: The computing system of claim 16, 
However, Molchanov does not explicitly teach: wherein adding one or more new edges to the machine-learned neural network comprises adding a same number of new edges to the machine-learned neural network as was deleted from the machine-learned neural network.
	But Bellec teaches: wherein adding one or more new edges to the machine-learned neural network comprises adding a same number of new edges to the machine-learned neural network as was deleted from the machine-learned neural network. (P. 3, Algorithm 1, lines 6-9 and P. 4, in the paragraph beginning with “The rewiring”, lines 3-8.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated Bellec’s step of activating a dormant/pruned connection in Molchanov’s network. A motivation for the combination is to ensure that the same number of connections with the network was initialized are active at any time during training. (Bellec, p. 4, in the paragraph beginning with “The rewiring aspect”: “For each connection that was set to the dormant state, a new connection                         
                            
                                
                                    k
                                
                                
                                    '
                                
                            
                        
                     is chosen randomly from the uniform distribution over dormant connections,                         
                            
                                
                                    k
                                
                                
                                    '
                                
                            
                        
                     is activated and its parameter is initialized to 0. This rewiring strategy (a) ensures that exactly                         
                            K
                        
                     connections are active at any time during training (one initializes the network with                         
                            K
                        
                     active connections)”)

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Molchanov et al. (“Pruning Convolutional Neural Networks For Resource Efficient Inference”) in view of Kauschke et al. (“Batchwise Patching of Classifiers”, cited in the PTO-892 filed 02/18/2022). 
	
Regarding CLAIM 9, Molchanov teaches: The computer-implemented method of claim 1, 
However, Molchanov does not explicitly teach: further comprising: adding, by the one or more computing devices, a patch subnetwork to the machine-learned neural network, wherein the patch subnetwork is trained to predict an error associated with its input.
	But Kauschke teaches: further comprising: adding, by the one or more computing devices, a patch subnetwork to the machine-learned neural network, wherein the patch subnetwork is trained to predict an error associated with its input. (P. 3375, § 2.2, steps (i) and (ii).)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have added Kauschke’s patch classifier to Molchanov’s neural network. A motivation for the combination is to find local patches to the global classification model that act in a flexible and efficient way without having to re-train the model from scratch. (Kauschke, P. 3374, col. 2, end of second paragraph)

Claims 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Molchanov et al. (“Pruning Convolutional Neural Networks For Resource Efficient Inference”) in view of Wang et al. (US Patent 11,200,495 B2, cited in the PTO-892 filed 02/18/2022.)

Regarding CLAIM 14, Molchanov teaches: The computing system of claim 10, wherein selecting one or more edges for deletion based at least in part on the respective estimated utility of each of the plurality of edges comprises selecting 
	Although the reference teaches stopping pruning at a predetermined number of GFLOPs (P. 10, lines 1-2) and it teaches formulas for computing a number of FLOPs in layers (P. 13, § A.1, Equations 11 and 12), Molchanov does not explicitly teach: selecting a predetermined number of the plurality of edges.
But Wang teaches: selecting a predetermined number of the plurality of edges. (The BRI of a number includes a percentage. Wang teaches this limitation in C. 3, L. 63 to C. 4, L. 5)
	It would have been obvious to one of ordinary skill in the art before the effective filing date to have pruned a predetermined percentage of Molchanov’s edges according to Wang’s method 100. A motivation for the combination is to clamp the weights of connections that are close to zero to zero. (Wang, C. 3, L. 64-65)

Regarding CLAIM 15, Molchanov teaches: The computing system of claim 10, wherein selecting one or more edges for deletion based at least in part on the respective estimated utility of each of the plurality of edges comprises selecting 
	Although the reference teaches stopping pruning at a predetermined number of GFLOPs (P. 10, lines 1-2) and it teaches formulas for computing a number of FLOPs in layers (P. 13, § A.1, Equations 11 and 12), Molchanov does not explicitly teach: selecting a predetermined percentage of the plurality of edges.
But Wang teaches: selecting a predetermined percentage of the plurality of edges. (C. 3, L. 63 to C. 4, L. 5)
	It would have been obvious to one of ordinary skill in the art before the effective filing date to have pruned a predetermined percentage of Molchanov’s edges according to Wang’s method 100. A motivation for the combination is to clamp the weights of connections that are close to zero to zero. (Wang, C. 3, L. 64-65)

Claims 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Molchanov et al. (“Pruning Convolutional Neural Networks For Resource Efficient Inference”) in view of Baum et al. (US 20180285736 A1, cited in the PTO-892 filed 02/18/2022).

	Regarding CLAIM 18, Molchanov teaches: One or more non-transitory computer-readable media that collectively store instructions that, when executed by one or more processors, cause the one or more processors to perform operations, the operations comprising: (Molchanov explicitly teaches processors by the CPUs and GPUs listed in Table 2, column 1. The experimental results throughout § 3 on pp. 6-10 are evidence of non-transitory computer-readable media.)
obtaining data descriptive of a machine-learned neural network, wherein the machine-learned neural network comprises a plurality of neurons respectively connected by a plurality of edges, and wherein a plurality of weights are respectively associated with the plurality of edges; (P. 2, § 2, line 13 teaches the network’s parameters include weights and biases. P. 3, lines 1-11 further teaches connections between convolutional layers of a convolutional neural network. In Fig. 1 on p. 2, the first step “Network” implies obtaining information about the network for the pruning procedure. Section 3 on pp. 6-10 teaches experimental results for at least VGG-16 network, AlexNet, and CaffeNet neural networks.)
determining a plurality of different proposed quantization schemes, each proposed quantization scheme including changes to the respective weight of one or more edges to be quantized under such scheme; (See the abstract, lines 1-6; and p. 2, last paragraph which teaches: “Starting with a full set of parameters W, we iteratively identify and remove the least important parameters, as illustrated in Figure 1.” The second step in Figure 1 is “Evaluating importance of neurons”, the third step is “Remove the least important neuron”, and the last step optionally returns to the second step. Each iteration is a quantization scheme because the least important weights are pruned.)
	estimating a change in loss for each of the plurality of different proposed quantization schemes, wherein estimating the change in loss for each proposed quantization scheme comprises determining an estimated change in utility of a respective edge to be quantized, the estimated change in utility being determined based on a derivative of the loss at the respective edge and a proposed change in a weight of the respective edge; (Molchanov teaches selecting proposed weights to prune based on the estimated changes in loss, as discussed further herein. Abstract, lines 1-6; On p. 2, the last paragraph teaches: “Starting with a full set of parameters W, we iteratively identify and remove the least important parameters, as illustrated in Figure 1.” The second step in Figure 1 is “Evaluating importance of neurons”, the third step is “Remove the least important neuron”, and the last step optionally returns to the second step. The footnote on p. 2 teaches: “A ‘parameter’                         
                            
                                
                                    w
                                    ,
                                    b
                                
                            
                            ∈
                            W
                        
                     might represent an individual weight”. On p. 3, § 2.1, the first paragraph teaches: “Minimizing the difference in accuracy between the full and pruned models depends on the criterion for identifying the ‘least important’ parameters, called saliency, at each step. The best criterion would be an exact empirical evaluation of each parameter, which we denote the oracle criterion, accomplished by ablating each non-zero parameter                         
                            w
                            ∈
                            
                                
                                    W
                                
                                
                                    '
                                
                            
                        
                     in turn and recording the cost’s difference.” On p. 3, § 2.2 teaches: “There are many heuristic criteria which are much more computationally efficient than the oracle… We describe these criteria in the following paragraphs and propose a new criterion which is based on the Taylor expansion.” On p. 4, the entire subsection titled “Taylor expansion” teaches a method of “directly approximate change in the loss function from removing a particular parameter” as discussed in line 3. Equations 3 shows the change in a loss, and the line above equation 4 states “To approximate                         
                            ∆
                            C
                            (
                            
                                
                                    h
                                
                                
                                    i
                                
                            
                            )
                        
                    , we use the first-degree Taylor polynomial.” The line below equation 7 teaches: “Intuitively, this criterion prunes parameters that have an almost flat gradient of the cost function w.r.t. feature map                         
                            
                                
                                    h
                                
                                
                                    i
                                
                            
                        
                    .”)

compressing the machine-learned neural network by applying 
	However, Molchanov does not explicitly teach: selecting one of the proposed quantization schemes
applying the selected one of the proposed quantization schemes to the machine-learned neural network
	But Baum teaches: selecting one of the proposed quantization schemes (On p. 10, in Claim 4, lines 4-6 teaches “selecting said quantization level that minimizes a quantization error of output from said at least one layer.”)
applying the selected one of the proposed quantization schemes to the machine-learned neural network (The last limitation of Claim 1 on p. 10 teaches “applying said quantization level to at least one layer.” Claim 1, line 4 teaches during an inference mode.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated Baum’s method of selecting a quantization level and applying the quantization level to Molchanov’s neural network. A motivation for the combination is the mechanism enables the reduction of the representation space and further reduces the memory (and energy thereof) needed to represent the network properties. (Baum, Abstract)

	Regarding CLAIM 19, the combination of Molchanov and Baum teaches: The one or more non-transitory computer-readable media of claim 18, 
Molchanov teaches: wherein determining the estimated change in utility of each edge to be quantized comprises determining the estimated change in utility of each edge to be quantized based at least in part on a first-order approximation of a loss function at the weight associated with the edge. (P. 4, subsection “Taylor expansion”, from lines 1 to line 12 which recites “To approximate                         
                            ∆
                            C
                            (
                            
                                
                                    h
                                
                                
                                    i
                                
                            
                            )
                        
                    , we use the first-degree Taylor polynomial.” Each iteration (p. 2, last paragraph) is a quantization scheme because the least important weights are pruned.)

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Molchanov et al. (“Pruning Convolutional Neural Networks For Resource Efficient Inference”) in view of Baum et al. (US 20180285736 A1, cited in the PTO-892 filed 02/18/2022) and Sadowski (“Notes on Backpropagation”, cited in the PTO-892 filed 02/18/2022).

	Regarding CLAIM 20, the combination of Molchanov and Baum teaches: The one or more non-transitory computer-readable media of claim 18, 
Molchanov teaches: wherein determining the estimated change in utility of each edge to be quantized comprises determining, for each edge to be quantized, a first derivative of a loss function  (On p. 4, equation 7 teaches the claim limitation. Equation 7 contains a first derivative of the cost function with respect to                         
                            
                                
                                    h
                                
                                
                                    i
                                
                            
                        
                    , where                         
                            
                                
                                    h
                                
                                
                                    i
                                
                            
                        
                     is the output produced from parameter                         
                            i
                        
                     according to line 4 of the subsection “Taylor expansion.” The subsection explicitly teaches using a first-order Taylor expansion and disregarding higher-order derivatives below equation 6. Each iteration (p. 2, last paragraph) is a quantization scheme because the least important weights are pruned. Crossed-out text is not explicitly taught by the reference.)
Molchanov teaches, below equation 7: “This approach requires accumulation of the product of the activation and the gradient of the cost function w.r.t. to the activation”. Molchanov teaches a first derivative of a loss function with respect to an activation of a receiving neuron. However, neither Molchanov nor Baum explicitly teaches: determining a first derivative of a loss function with respect to a logit of a receiving neuron
But Sadowski teaches: determining a first derivative of a loss function with respect to a logit of a receiving neuron (P. 2, equation (11) teaches a first derivative of an error with respect to a logit                         
                            
                                
                                    s
                                
                                
                                    j
                                
                            
                        
                    . The paragraph above equation (11) and p. 1 from the start through equation (3) explains the variables and indices.)
	Sadowski is in the same field of endeavor as the claimed invention, namely, machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have performed backpropagation on Molchanov’s neural network, with a motivation to reduce the error of the network. (Sadowski, Abstract). 

Response to Arguments
Applicant's arguments filed 09/02/2022 in response to the office action mailed 06/02/2022 have been fully considered but they are not persuasive.

Claim Rejections under 35 U.S.C. 101 (Remarks pp. 7-8)
Applicant’s arguments #1: “Applicant respectfully submits that claim 1, at least as amended, is not directed to an unpatentable abstract idea. As discussed during the above-noted interview, Applicant respectfully submits that claim 1 is directed to an improvement in generating smaller machine-learned models in view of maintaining performance capabilities by intelligently making pruning determinations based in part on an estimated utility. Thus, any alleged abstract idea is incorporated into a practical application that improves the functioning of computers by providing more effective, sparsely represented machine-learned models…. 
“In this manner, for example, the claimed technique provides a specific improvement by generating sparse representations of model(s) based on a more intelligent approach to pruning decisions. 
Examiner’s response # 1: In the 35 U.S.C. 101 inquiry for claim 1, in Step 2A Prong 1, the limitations of determining an estimated utility of the edge and approximating an effect of pruning the edge are evaluation and judgement mental processes which can reasonably be performed in one’s mind with the aid of pencil and paper. Approximating a change in a loss is a mathematical calculation. 
The limitation generating… a sparse representation of the machine-learned neural network, the sparse representation providing a reduced size for the machine-learned neural network is a mathematical concept of generating a sparse representation of a mathematical model, the sparse representation providing a reduced size for the mathematical model. According to MPEP 2106.05(a), subsection II, an improvement in the abstract idea itself is not an improvement in technology. In Step 2A Prong 2 and Step 2B of the inquiry, the machine-learned neural network is generally linking the judicial exceptions to the technological environment of machine learning. Accordingly, the claim is directed to an abstract idea, and the claim is not patent eligible.

Applicant’s arguments # 2: “Therefore, Applicant respectfully submits that the claims at least integrate any alleged abstract idea into a practical application of improving the storage and execution efficiency of machine-learned models such that the claims are not "directed to" an abstract idea. For at least the above reasons, Applicant respectfully requests withdrawal of the rejection of claim 1 under § 101.”
Examiner’s response #2: In the 35 U.S.C. 101 inquiry for claim 1,in Step 2A Prong 2 and Step 2B, the machine-learned neural network is generally linking the judicial exceptions to the technological environment of machine learning. Accordingly, the claim is directed to an abstract idea, and the claim is not patent eligible.

Claim Rejections under 35 U.S.C. 102 and 103 (Remarks pp. 9-11): Applicant’s arguments with respect to claims 1-7 and 9-21 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Asher H. Jablon whose telephone number is (571)270-7648. The examiner can normally be reached Monday - Friday, 9:00 am - 6:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Al Kawsar can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/A.H.J./Examiner, Art Unit 2127                                                                                                                                                                                                        
/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127