DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Amendments
Claims 1-2, 5-7, and 10-11 are amended. Claim 8 is cancelled. Claim 21 is new. Claims 1-7 and 9-21 are pending and have been considered. 

Drawings
The drawings were received on 5/18/2022.  These drawings are acceptable.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

CLAIM 1
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
for at least one edge of the plurality of edges: determining an estimated utility of the edge (e.g., evaluating an estimated utility of the edge)
the estimated utility is determined by approximating an effect of the edge on network performance (e.g., judging the importance of an edge)
for at least one edge of the plurality of edges: determining to prune the edge based at least in part on the estimated utility of such edge. (e.g., evaluating or considering a choice regarding the importance of an edge)
	These limitations are mathematical computations, and they are mental processes which can reasonably be performed in one’s mind with the aid of pen and paper. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
one or more computing devices
obtaining data 
descriptive of a machine-learned neural network, the machine-learned neural network comprises a plurality of neurons respectively connected by a plurality of edges, and wherein a plurality of weights are respectively associated with the plurality of edges; and 
generating, based on the determining to prune at least one edge of the plurality of edges, a sparse representation
One or more computing devices and a machine-learned neural network are generally linking the abstract idea to the particular technological environment of machine learning as discussed in MPEP 2106.05(h). This section also states: “For instance, a data gathering step that is limited to… a particular type of data (such as power grid data or XML tags) could be considered to be both insignificant extra-solution activity and a field of use limitation.” Obtaining data is mere data-gathering which is an insignificant extra-solution activity. See MPEP 2106.05(g). Generating a sparse representation amounts to merely indicating a field of use or technological environment in which to apply the judicial exception of determining, which is a mental process-type abstract idea. Generating a sparse representation is generally linking the mental process of determining to the particular technological environment of machine learning, as discussed in MPEP 2106.05(h).
Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception(s). One or more computing devices and a machine-learned neural network are generally linking the abstract idea to the particular technological environment of machine learning as discussed in MPEP 2106.05(h). Obtaining data is well-understood, routine, conventional activity of receiving data over a network as discussed in MPEP § 2106.05(d), subsection II, example (i). Generating a sparse representation amounts to merely indicating a field of use or technological environment in which to apply the judicial exception of determining, which is a mental process-type abstract idea. Generating a sparse representation is generally linking the mental process of determining to the particular technological environment of machine learning, as discussed in MPEP 2106.05(h). The claim is not patent eligible.

CLAIM 2 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 1 are incorporated. The claim recites the following limitations:
determining the estimated utility of the edge comprises determining the estimated utility of the edge based at least in part on an approximation of a loss function at the weight associated with the edge. (e.g., evaluating an estimated utility of the edge)
This limitation is mathematical computation, and it is a mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. The broadest reasonable interpretation of “determining the estimated utility” includes determining or evaluating a utility based on a value. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
one or more computing devices
One or more computing devices are generally linking the abstract idea to the particular technological environment of machine learning as discussed in MPEPE 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception(s). One or more computing devices are generally linking the abstract idea to the particular technological environment of machine learning as discussed in MPEPE 2106.05(h). The claim is not patent eligible.

CLAIM 3 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 1 are incorporated. The claim recites the following limitations:
determining the estimated utility of the edge comprises determining a first derivative of a loss function with respect to a logit… but not determining any higher-order derivatives of the loss function. 
This limitation is a mathematical computation of calculating a first derivative. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
one or more computing devices
a receiving neuron at the weight associated with the edge
One or more computing devices and a receiving neuron are generally linking the abstract idea to the particular technological environment of machine learning, as discussed in MPEPE 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception(s). One or more computing devices and a receiving neuron are generally linking the abstract idea to the particular technological environment of machine learning, as discussed in MPEPE 2106.05(h). The claim is not patent eligible.

CLAIM 4 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 1 are incorporated. The claim recites the following limitations:
determining the estimated utility of the edge comprises determining a sum over one or more training examples included in a training dataset of a proposed change in the weight multiplied by an output of a transmitting neuron multiplied by a first derivative of a loss function with respect to a logit of a receiving neuron at the weight and training example. 
This limitation is a mathematical computation of calculating a first derivative. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
one or more computing devices
One or more computing devices are generally linking the abstract idea to the particular technological environment of machine learning as discussed in MPEPE 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception(s). One or more computing devices are generally linking the abstract idea to the particular technological environment of machine learning as discussed in MPEPE 2106.05(h). The claim is not patent eligible.

CLAIM 5 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 1 are incorporated. The claim recites the following limitations:
generating a sparse weight matrix. (e.g., judging/evaluating a network to create a sparse representation)
This limitation is a mathematical computation, and it is a mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
one or more computing devices
One or more computing devices are generally linking the abstract idea to the particular technological environment of machine learning, as discussed in MPEPE 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception(s). One or more computing devices are generally linking the abstract idea to the particular technological environment of machine learning, as discussed in MPEPE 2106.05(h). The claim is not patent eligible.

CLAIM 6 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 5 are incorporated. The claim recites the following limitations:
supplementing the machine-learned neural network with at least one additional edge at a different location within the machine-learned neural network. (e.g., activating a different weight value)
These limitations are mathematical computations, and they are mental processes and they are mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
one or more computing devices
One or more computing devices are generally linking the abstract idea to the particular technological environment of machine learning, as discussed in MPEPE 2106.05(h).
Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception(s). One or more computing devices are generally linking the abstract idea to the particular technological environment of machine learning, as discussed in MPEPE 2106.05(h). The claim is not patent eligible.

CLAIM 7 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 5 are incorporated. The claim recites the following limitations:
… prevents any other edges that connect to a same neuron as the edge from being modified in one or more pruning iterations. (evaluation or judgement)
Preventing other edges from being modified are mental process which can reasonably be performed be performed in one’s mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
one or more computing devices
storing a data item
One or more computing devices are generally linking the abstract idea to the particular technological environment of machine learning, as discussed in MPEP 2106.05(h). Storing a data item is mere data-gathering which is an insignificant extra-solution activity. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception(s). One or more computing devices are generally linking the abstract idea to the particular technological environment of machine learning, as discussed in MPEP 2106.05(h). Storing a data item is well-understood, routine, conventional activity of storing information in memory, as discussed in MPEP § 2106.05(d), subsection II, example (iv). The claim is not patent eligible.

CLAIM 9 incorporates the rejection of claim 1.
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 1 are incorporated. The claim recites the following limitations:
 adding a patch subnetwork 
Adding a patch subnetwork is a mathematical computation, and it is a mental process which can reasonably be performed be performed in one’s mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
one or more computing devices
the machine-learned neural network
the patch subnetwork is trained to predict an error associated with its input.
These additional elements are generally linking the abstract idea to the particular technological environment of machine learning, as discussed in MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception(s). These additional elements are generally linking the abstract idea to the particular technological environment of machine learning, as discussed in MPEP 2106.05(h). The claim is not patent eligible.

CLAIM 10
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
determining a respective estimated utility of each of the plurality of edges; (e.g., evaluating an estimated utility of the edge)
wherein the respective estimated utility is determined by approximating an effect of the edges on network performance
selecting one or more edges for deletion based at least in part on the respective estimated utility of each of the plurality of edges; and 
deleting the selected one or more edges.
	These limitations are mathematical computations, and they are mental processes which can reasonably be performed in one’s mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
A computer system
one or more processors; 
25one or more non-transitory computer-readable media 
instructions
obtaining data 
descriptive of a machine-learned neural network, the machine-learned neural network comprises a plurality of neurons respectively connected by a plurality of edges;
A computer system, processors, non-transitory computer-readable media, instructions, and a machine-learned neural network are generally linking the abstract idea to the particular technological environment of machine learning as discussed in MPEP 2106.05(h). This section also states: “For instance, a data gathering step that is limited to… a particular type of data (such as power grid data or XML tags) could be considered to be both insignificant extra-solution activity and a field of use limitation.” Obtaining data is mere data-gathering which is an insignificant extra-solution activity. See MPEP 2106.05(g). 
Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception(s). A computer system, processors, non-transitory computer-readable media, instructions, and a machine-learned neural network are generally linking the abstract idea to the particular technological environment of machine learning as discussed in MPEP 2106.05(h). Obtaining data is well-understood, routine, conventional activity of receiving data over a network, as discussed in MPEP § 2106.05(d), subsection II, example (i). The claim is not patent eligible.

CLAIM 11 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 10 are incorporated. The claim recites the following limitations:
wherein determining the respective estimated utility of each of the plurality of edges comprises determining the respective estimated utility of each of the plurality of edges based at least in part on an approximation of a loss function at a weight associated with the edge. (e.g., evaluating an estimated utility of the edge)
This limitation is mathematical computation, and it is a mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. The broadest reasonable interpretation of “determining the estimated utility” includes determining or evaluating a utility based on a value. Accordingly, the claim recites an abstract idea. 
 Step 2A Prong 2: The judicial exceptions are not integrated into a practical application because the claim recites no additional elements they impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception(s). The claim is not patent eligible.

CLAIM 12 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 10 are incorporated. The claim recites the following limitations:
wherein determining the respective estimated utility of each of the plurality of edges comprises determining, for each edge, a first derivative of a loss function with respect to a logit of a receiving neuron at a weight associated with the edge without determining any higher-order derivatives of the loss function. 
This limitation is a mathematical computation of calculating a first derivative. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application because the claim recites no additional elements they impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception(s). The claim is not patent eligible.

CLAIM 13 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 10 are incorporated. The claim recites the following limitations:
wherein determining the respective estimated utility of each of the plurality of edges comprises determining, for each edge, a sum over one or more training examples included in a training dataset of a negative weight multiplied by an output of a transmitting neuron multiplied by a first derivative of a loss function with respect to a logit of a receiving neuron at the weight and training example.
This limitation is a mathematical computation of calculating a first derivative. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application because the claim recites no additional elements they impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception(s). The claim is not patent eligible.

CLAIM 14 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 10 are incorporated. The claim recites the following limitations:
selecting one or more edges for deletion based at least in part on the respective estimated utility of each of the plurality of edges 26comprises selecting a predetermined number of the plurality of edges that have the lowest estimated utilities.
This limitation is a mathematical computation, and it is a mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application because the claim recites no additional elements they impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception(s). The claim is not patent eligible.

CLAIM 15 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 10 are incorporated. The claim recites the following limitations:
selecting one or more edges for deletion based at least in part on the respective estimated utility of each of the plurality of edges comprises selecting a predetermined percentage of the plurality of edges that have the lowest estimated utilities.
This limitation is a mathematical computation, and it is a mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application because the claim recites no additional elements they impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception(s). The claim is not patent eligible.

CLAIM 16 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 10 are incorporated. The claim recites the following limitations:
adding one or more new edges 
This limitation is a mathematical computation, and it is a mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
the machine-learned neural network.
A machine-learned neural network are generally linking the abstract idea to the particular technological environment of machine learning as discussed in MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception(s). A machine-learned neural network are generally linking the abstract idea to the particular technological environment of machine learning as discussed in MPEP 2106.05(h). The claim is not patent eligible.

CLAIM 17 incorporates the rejection of claim 16.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 16 are incorporated. The claim recites the following limitations:
adding one or more new edges comprises adding a same number of new edges as was deleted from the machine-learned neural network.
This limitation is a mathematical computation, and it is a mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
the machine-learned neural network.
A machine-learned neural network are generally linking the abstract idea to the particular technological environment of machine learning as discussed in MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception(s). A machine-learned neural network are generally linking the abstract idea to the particular technological environment of machine learning as discussed in MPEP 2106.05(h). The claim is not patent eligible.

CLAIM 18
Step 1: The claim recites a product, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
determining a plurality of different proposed quantization schemes, each proposed quantization scheme including changes to the respective weight of one or more edges to be quantized under such scheme; (e.g., determining a list of quantization schemes)
estimating a change in loss for each of the plurality of different proposed quantization schemes, (e.g., evaluating a change)
determining an estimated change in utility of each edge to be quantized; (e.g., evaluating an estimated change)
selecting one of the proposed quantization schemes based at least in part on the estimated changes in loss; and (e.g., selecting)
27applying the selected quantization scheme
changing the respective weight of the one or more edges to be quantized under such scheme.
	These limitations are mathematical computations, and they are mental processes which can reasonably be performed in one’s mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
One or more non-transitory computer-readable media
instructions 
one or more processors, 
obtaining data descriptive of a machine-learned neural network
the machine-learned neural network comprises a plurality of neurons respectively connected by a plurality of edges, and wherein a plurality of weights are respectively associated with the plurality of edges; 
Non-transitory computer-readable media, instructions, processors, and the machine-learned neural network are generally linking the abstract idea to the particular technological environment of machine learning, as discussed in MPEP 2106.05(h). Processors are mere instructions to implement the abstract ideas on a computer, as discussed in MPEP 2106.05(f). Obtaining data descriptive of a machine-learned neural network is mere data-gathering which is an insignificant extra-solution activity, as discussed in MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception(s). Non-transitory computer-readable media, instructions, processors, and the machine-learned neural network are generally linking the abstract idea to the particular technological environment of machine learning, as discussed in MPEP 2106.05(h). Processors are mere instructions to implement the abstract ideas on a computer, as discussed in MPEP 2106.05(f). Obtaining data descriptive of a machine-learned neural network is well-understood, routine, conventional activity of receiving data over a network, as discussed in MPEP 2106.05(d), subsection II, example (i). The claim is not patent eligible.
CLAIM 19 incorporates the rejection of claim 18.
Step 1: The claim recites a product, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 18 are incorporated. The claim recites the following limitations:
determining the estimated change in utility of each edge to be quantized comprises determining the estimated change in utility of each edge to be quantized based at least in part on a first-order approximation of a loss function at the weight associated with the edge. (e.g., evaluating an estimated change in utility of each edge)
The limitation is a mathematical computation, and it is a mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2: The claim recites no additional elements to integrate the judicial exceptions into a practical application. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception(s). The claim is not patent eligible.

CLAIM 20 incorporates the rejection of claim 18.
Step 1: The claim recites a product, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 18 are incorporated. The claim recites the following limitations:
determining the estimated change in utility of each edge to be quantized comprises determining, for each edge to be quantized, a first derivative of a loss function with respect to a logit of a receiving neuron at the weight associated with the edge without determining any higher-order derivatives of the loss function. 
This limitation is a mathematical computation of calculating a first derivative. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2: The claim recites no additional elements to integrate the judicial exceptions into a practical application. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception(s). The claim is not patent eligible.

CLAIM 21 incorporates the rejection of claim 10.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 10 are incorporated. The claim recites the following limitations:
	generating a sparse representation of the machine-learned neural network
This limitation is a mathematical computation, and it is also a mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2: The claim recites no additional elements to integrate the judicial exceptions into a practical application. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception(s). The claim is not patent eligible.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 18-19 is rejected under 35 U.S.C. 102(a)(1) as being anticipated by Baum et al. (US 20180285736 A1, see PTO-892 filed 02/18/2022).

Regarding CLAIM 18, Baum teaches: One or more non-transitory computer-readable media that collectively store instructions that, when executed by one or more processors, cause the one or more processors to perform operations, (¶ [0051], first sentence)
the operations comprising: obtaining data descriptive of a machine-learned neural network, wherein the machine-learned neural network comprises a plurality of neurons respectively connected by a plurality of edges, and wherein a plurality of weights are respectively associated with the plurality of edges; (Obtaining data includes analyzing weights in at least one layer as taught by Baum claim 4 on p. 10. An ANN is taught by ¶ [0007] and ¶ [0074], figure 2.)
determining a plurality of different proposed quantization schemes, each proposed quantization scheme including changes to the respective weight of one or more edges to be quantized under such scheme; (Two quantization schemes are equations (2) and (4) in ¶ [0091]. Two other quantization schemes are taught in ¶ [0104], where scale-and-shift is taught in ¶ [0105] and dropping bits in ¶ [0112].)
estimating a change in loss for each of the plurality of different proposed quantization schemes, wherein estimating the change in loss for each proposed quantization scheme comprises determining an estimated change in utility of each edge to be quantized; (Abstract; Estimating a change in loss is taught by ¶ [0091], equations (2) and (4), and by Claim 4, lines 4-6 on p. 10, “selecting…”)
selecting one of the proposed quantization schemes based at least in part on the estimated changes in loss; and (Baum p. 10, Claim 4, lines 4-6: “selecting…”)
27applying the selected quantization scheme to the machine-learned neural network, wherein applying the selected quantization scheme comprises changing the respective weight of the one or more edges to be quantized under such scheme. (Baum p. 10, Claim 1, last limitation.)

Regarding CLAIM 19, Baum teaches: The one or more non-transitory computer-readable media of claim 18, wherein determining the estimated change in utility of each edge to be quantized comprises determining the estimated change in utility of each edge to be quantized based at least in part on a first-order approximation of a loss function at the weight associated with the edge.  (The BRI of an estimated change in utility includes computing a loss function based on gradient descent, as taught in the middle of ¶ [0011] (“Minimizing this cost… networks”).)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-2, 5-7, 10-11, 16-17, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Bellec et al. (“Deep Rewiring: Training very sparse deep networks”) in view of Baum et al. (US 20180285736 A1). Both references were cited in the PTO-892 filed 02/18/2022.

	Regarding CLAIM 1, Bellec teaches: A computer-implemented method, comprising: 
obtaining, by one or more computing devices, data descriptive of a machine-learned neural network, (P. 2, ¶ 2, lines 5-7; P. 3, “The Deep R Algorithm”, end of first paragraph teaches data of network parameters θ and network weights w. On p. 4, middle paragraph, line 6 teach a number of connections K are active during training.)
wherein a plurality of weights are respectively associated with the plurality of edges; and (P. 2, ¶ 2, lines 5-7; P. 3, “The Deep R Algorithm”, second paragraph, lines 1-3)
for at least one edge of the plurality of edges: 
determining, by the one or more computing devices, an estimated utility of the edge, wherein the estimated utility is determined by approximating an effect of the edge on network performance; (The BRI of determining an estimated utility of the edge includes determining the sign of a connection parameter, because the sign controls whether the connection gets pruned (i.e., becomes dormant, in Bellec’s words). Taught by p. 2, second paragraph, lines 5-9; and by p. 3, “The Deep R Algorithm”, second paragraph, lines 1-3, where a dormant connection is equivalent to a pruned connection. The BRI approximating an effect of the edge on network performance includes performing a gradient update and determining the weight is unimportant. See p. 3, Algorithm 1, lines 3-4; p. 3, last 2 paragraphs; and p. 4, first full paragraph, lines 1-3.)
determining, by the one or more computing devices, to prune the edge based at least in part on the estimated utility of such edge; and (p. 3, “The Deep R Algorithm”, second paragraph, second sentence.)
	generating, by the one or more computing devices and based on the determining to prune at least one edge of the plurality of edges, a sparse representation of the machine-learned neural network. (According to p. 4, first full paragraph, the network is initialized with K active connections and exactly K connections are active at any time during training. When a connection becomes dormant (i.e., pruned), a different dormant connection is activated. The BRI of a “sparse representation” includes a sparse weight matrix. On p. 3, “The DEEP R algorithm”, the first paragraph, last line through the second paragraph, second line teaches a network weight matrix w containing zero-value weights for dormant connections.  Each time a dormant connection is activated, it generates a weight matrix. Sparsity is taught by the abstract, line 6; and by p. 5, second full paragraph, lines 1-2. Fig. 1 is on p. 4.)
	However, Bellec does not explicitly teach: obtaining, determining, determining, and generating by one or more computing devices 
wherein the machine-learned neural network comprises a plurality of neurons respectively connected by a plurality of edges, and
	But Baum teaches: obtaining, determining, determining, and generating by one or more computing devices (¶ [0059] teaches computing device 11)
wherein the machine-learned neural network comprises a plurality of neurons respectively connected by a plurality of edges, and (¶ [0074] teaches an ANN.)
	Baum is in the same field of endeavor as the claimed invention, namely machine learning. Therefore, it would have been obvious to one of ordinary skill in the art to have used Baum’s computing device to perform Bellec’s experiments, and to have incorporated Baum’s neural network structure into Bellec’s system. A motivation for the combination is to improve the performance of neural networks. (Baum ¶ [0003])

Regarding CLAIM 2, the combination of Bellec and Baum teaches: The computer-implemented method of claim 1,
Bellec teaches: wherein determining, by the one or more computing devices, the estimated utility of the edge comprises determining, by the one or more computing devices, the estimated utility of the edge based at least in part on an approximation of a loss function at the weight associated with the edge. (The BRI of this limitation includes determining an updated connection parameter                         
                            
                                
                                    θ
                                
                                
                                    k
                                
                            
                        
                     in Algorithm 1, line 3. The error function in line 3 is further taught on p. 2, § 2, end of ¶ 1.)
	However, Bellec does not explicitly teach: determining, by the one or more computing devices
	But Baum teaches: determining, by the one or more computing devices (¶ [0059] teaches computing device 11)

Regarding CLAIM 5, the combination of Bellec and Baum teaches: The computer-implemented method of claim 1,
	Bellec teaches: wherein generating the sparse representation of the machine-learned neural network comprises generating, by the one or more computing devices, a sparse weight matrix. (On p. 3, “The DEEP R algorithm”, the first paragraph, last line through the second paragraph, second line teaches a network weight matrix w containing zero-value weights for dormant connections. Sparsity is taught by the abstract, line 6; and by p. 5, second full paragraph, lines 1-2.)
	However, Bellec does not explicitly teach: generating, by the one or more computing devices
	But Baum teaches: generating, by the one or more computing devices (¶ [0059] teaches computing device 11)

Regarding CLAIM 6, the combination of Bellec and Baum teaches: The computer-implemented method of claim 1,
 Bellec teaches: further comprising: supplementing, by the one or more computing devices, the machine-learned neural network with at least one additional edge at a different location within the machine-learned neural network. (P. 3, Algorithm 1, line 7; and p. 4, first full paragraph, lines 3-5.)
	However, Bellec does not explicitly teach: pruning and supplementing, by the one or more computing devices
	But Baum teaches: supplementing, by the one or more computing devices (¶ [0059] teaches computing device 11)

Regarding CLAIM 7, the combination of Bellec and Baum teaches: The computer-implemented method of claim 1,
Bellec teaches: further comprising: storing, by the one or more computing devices, a data item that prevents any other edges that connect to a same neuron as the edge from being modified in one or more pruning iterations. (The broadest reasonable interpretation of this claim is that the last iteration of Algorithm 1 (P. 3) prevents any other edges from being modified.)
However, Bellec does not explicitly teach: pruning, by the one or more computing devices and storing, by the one or more computing devices
	But Baum teaches: pruning, by the one or more computing devices (¶ [0059] teaches computing device 11)
	storing, by the one or more computing devices (¶ [0059] teaches computing device 11 comprises main memory 24)

	Regarding CLAIM 10, Bellec teaches: obtaining data descriptive of a machine-learned neural network, (P. 2, ¶ 2, lines 5-7; P. 3, “The Deep R Algorithm”, end of first paragraph teaches data of network parameters θ and network weights w. On p. 4, middle paragraph, line 6 teach a number of connections K are active during training.)
determining a respective estimated utility of each of the plurality of edges, wherein the respective estimated utility is determined by approximating an effect of the edge on network performance; (The BRI of determining an estimated utility of the edge includes determining the sign of a connection parameter, because the sign controls whether the connection gets pruned (i.e., becomes dormant, as Bellec teaches). Taught by p. 2, second paragraph, lines 5-9; and by p. 3, “The Deep R Algorithm”, second paragraph, lines 1-3, where a dormant connection is equivalent to a pruned connection. The BRI approximating an effect of the edge on network performance includes performing a gradient update and determining the weight is unimportant. See p. 3, Algorithm 1, lines 3-4; p. 3, last 2 paragraphs; and p. 4, first full paragraph, lines 1-3.)
selecting one or more edges for deletion based at least in part on the respective estimated utility of each of the plurality of edges; and (p. 3, “The Deep R Algorithm”, second paragraph, second sentence and Algorithm 1, line 4, where deleting an edge is interpreted as setting a weight to zero.)
deleting the selected one or more edges. (p. 3, “The Deep R Algorithm”, second paragraph, second sentence and Algorithm 1, line 4, where deleting an edge is interpreted as setting a weight to zero.)
However, Bellec does not explicitly teach: A computer system, comprising: one or more processors; and 25one or more non-transitory computer-readable media that collectively store instructions that, when executed by the one or more processors, cause the computer system to perform operations, the operations comprising:
wherein the machine-learned neural network comprises a plurality of neurons respectively connected by a plurality of edges;
But Baum teaches: A computer system, comprising: one or more processors; and 25one or more non-transitory computer-readable media that collectively store instructions that, when executed by the one or more processors, cause the computer system to perform operations, the operations comprising: (Baum teaches a computer system by the computing device 11 in ¶ [0059], processors in ¶ [0060], line 1, and main memory in ¶ [0059], last line. Baum teaches instructions in ¶ [0051], first sentence.)
wherein the machine-learned neural network comprises a plurality of neurons respectively connected by a plurality of edges; (¶ [0074] teaches an ANN.)
Baum is in the same field of endeavor as the claimed invention, namely machine learning. Therefore, it would have been obvious to one of ordinary skill in the art to have used Baum’s computing device to perform Bellec’s experiments, and to have incorporated Baum’s neural network structure into Bellec’s system. A motivation for the combination is to improve the performance of neural networks. (Baum ¶ [0003])

Regarding CLAIM 11, the combination of Bellec and Baum teaches: The computing system of claim 10, 
Bellec teaches: wherein determining the respective estimated utility of each of the plurality of edges comprises determining the respective estimated utility of each of the plurality of edges based at least in part on an approximation of a loss function at a weight associated with the edge. (The BRI of this limitation includes determining an updated connection parameter                         
                            
                                
                                    θ
                                
                                
                                    k
                                
                            
                        
                     in Algorithm 1, line 3. The error function in line 3 is further taught on p. 2, § 2, end of ¶ 1.)

Regarding CLAIM 16, the combination of Bellec and Baum teaches: The computing system of claim 10, 
Bellec teaches: wherein the operations further comprise adding one or more new edges to the machine-learned neural network. (P. 3, Algorithm 1, lines 6-9 and P. 4, middle paragraph, lines 3-7)

Regarding CLAIM 17, the combination of Bellec and Baum teaches: The computing system of claim 16, 
Bellec teaches: wherein adding one or more new edges to the machine-learned neural network comprises adding a same number of new edges to the machine- learned neural network as was deleted from the machine-learned neural network. (P. 3, Algorithm 1, lines 6-9 and P. 4, middle paragraph, lines 3-7)

Regarding CLAIM 21, the combination of Bellec and Baum teaches: The computing system of claim 10, 
Bellec teaches: wherein deleting the selected one or more edges comprises generating a sparse representation of the machine-learned neural network. (According to p. 4, first full paragraph, the network is initialized with K active connections and exactly K connections are active at any time during training. When a connection becomes dormant (i.e., pruned), a different dormant connection is activated. The BRI of a “sparse representation” includes a sparse weight matrix. On p. 3, “The DEEP R algorithm”, the first paragraph, last line through the second paragraph, second line teaches a network weight matrix w containing zero-value weights for dormant connections.  Each time a dormant connection is activated, it generates a weight matrix. Sparsity is taught by the abstract, line 6; and by p. 5, second full paragraph, lines 1-2. Fig. 1 is on p. 4.)

Claims 3 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Bellec et al. (“Deep Rewiring: Training very sparse deep networks”) in view of Baum et al. (US 20180285736 A1), and further in view of Sadowski (“Notes on Backpropagation”). All references were cited in the PTO-892 filed 02/18/2022.

	Regarding CLAIM 3, the combination of Bellec and Baum teaches: The computer-implemented method of claim 1, 
Bellec teaches: wherein determining the estimated utility of the edge comprises determining a backpropagation algorithm (The BRI of this limitation includes determining the sign of a connection parameter (p. 2, ¶ 2, lines 5-9) and performing gradient descent using backpropagation (p. 2, Algo. 1, line 3 and caption).)
However, Bellec does not explicitly teach: determining, by the one or more computing devices, a first derivative of a loss function with respect to a logit of a receiving neuron at the weight associated with the edge, but not determining, by the one or more computing devices, any higher-order derivatives of the loss function.
But Baum teaches: determining, by the one or more computing devices (¶ [0059] teaches computing device 11)
However, neither Bellec nor Baum explicitly teaches: determining a first derivative of a loss function with respect to a logit of a receiving neuron at the weight associated with the edge, but not determining any higher-order derivatives of the loss function.
	But Sadowski teaches: determining a first derivative of a loss function with respect to a logit of a receiving neuron at the weight associated with the edge, but not determining any higher-order derivatives of the loss function. (P. 2, equation (11) teaches a first derivative of an error with respect to a logit                         
                            
                                
                                    s
                                
                                
                                    j
                                
                            
                        
                    .)
	Sadowski is in the same field of endeavor as the claimed invention, namely, machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have performed backpropagation on Bellec’s neural network, with a motivation to reduce the error of the network. (Sadowski, Abstract). 

	Regarding CLAIM 12, the combination of Bellec and Baum teaches: The computing system of claim 10, 
Bellec teaches: wherein determining the respective estimated utility of each of the plurality of edges comprises determining a backpropagation algorithm (The BRI of this limitation includes determining the sign of a connection parameter (p. 2, ¶ 2, lines 5-9) and performing gradient descent using backpropagation (p. 2, Algo. 1, line 3 and caption).)
However, neither Bellec nor Baum explicitly teaches: determining, for each edge, a first derivative of a loss function with respect to a logit of a receiving neuron at a weight associated with the edge without determining any higher-order derivatives of the loss function.
	But Sadowski teaches: determining, for each edge, a first derivative of a loss function with respect to a logit of a receiving neuron at a weight associated with the edge without determining any higher-order derivatives of the loss function. (P. 2, equation (11) teaches a first derivative of an error with respect to a logit                         
                            
                                
                                    s
                                
                                
                                    j
                                
                            
                        
                    .)
	Sadowski is in the same field of endeavor as the claimed invention, namely, machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have performed backpropagation on Bellec’s neural network, with a motivation to reduce the error of the network. (Sadowski, Abstract). 

Claims 4 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Bellec et al. (“Deep Rewiring: Training very sparse deep networks”) in view of Baum et al. (US 20180285736 A1), and further in view of Ng (“Sparse autoencoder”). All references were cited in the PTO-892 filed 02/18/2022.

Regarding CLAIM 4, the combination of Bellec and Baum teaches: The computer-implemented method of claim 1, 
Bellec teaches: determining the estimated utility of the edge comprises determining a backpropagation algorithm (The BRI of this limitation includes determining the sign of a connection parameter (p. 2, ¶ 2, lines 5-9) and performing gradient descent using backpropagation (p. 2, Algo. 1, line 3 and caption).)
However, Bellec does not explicitly teach: determining, by the one or more computing devices, a sum over one or more training examples included in a training dataset of a proposed change in the weight multiplied by an output of a transmitting neuron multiplied by a first derivative of a loss function with respect to a logit of a receiving neuron at the weight and training example.
	But Baum teaches: determining, by the one or more computing devices (¶ [0059] teaches computing device 11)
	However, neither Bellec nor Baum explicitly teaches: determining a sum over one or more training examples included in a training dataset of a proposed change in the weight multiplied by an output of a transmitting neuron multiplied by a first derivative of a loss function with respect to a logit of a receiving neuron at the weight and training example.
But Ng teaches: determining a sum over one or more training examples included in a training dataset of a proposed change in the weight multiplied by an output of a transmitting neuron multiplied by a first derivative of a loss function with respect to a logit of a receiving neuron at the weight and training example. (A sum over training examples is taught by equation                         
                            
                                
                                    ∂
                                
                                
                                    ∂
                                    
                                        
                                            W
                                        
                                        
                                            i
                                            j
                                        
                                        
                                            
                                                
                                                    l
                                                
                                            
                                        
                                    
                                
                            
                            J
                            
                                
                                    W
                                    ,
                                    b
                                
                            
                        
                     on p. 7 at the bottom, where each pair (x,y) form a training example. A proposed change in the weight is the learning rate                         
                            α
                        
                     in the update formula for                         
                            
                                
                                    W
                                
                                
                                    i
                                    j
                                
                                
                                    
                                        
                                            l
                                        
                                    
                                
                            
                        
                     on p. 7. An output of a transmitting neuron is                         
                            
                                
                                    a
                                
                                
                                    j
                                
                                
                                    
                                        
                                            l
                                        
                                    
                                
                            
                        
                     in step 4 on p. 8. A first derivative of a loss function with respect to a logit is                         
                            
                                
                                    δ
                                
                                
                                    i
                                
                                
                                    
                                        
                                            
                                                
                                                    n
                                                
                                                
                                                    l
                                                
                                            
                                        
                                    
                                
                            
                        
                     in step 2 on p. 8, where z is a logit, as discussed on p. 4 below equation (5).)
	Ng is in the same field of endeavor as the claimed invention, namely, machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have performed backpropagation on Bellec’s neural network, with a motivation to train the network. (Ng, top of p. 6)

	Regarding CLAIM 13, the combination of Bellec and Baum teaches: The computing system of claim 10, 
However, neither Bellec nor Baum explicitly teaches: wherein determining the respective estimated utility of each of the plurality of edges comprises determining, for each edge, a sum over one or more training examples included in a training dataset of a negative weight multiplied by an output of a transmitting neuron multiplied by a first derivative of a loss function with respect to a logit of a receiving neuron at the weight and training example.
But Ng teaches: wherein determining the respective estimated utility of each of the plurality of edges comprises determining, for each edge, a sum over one or more training examples included in a training dataset of a negative weight multiplied by an output of a transmitting neuron multiplied by a first derivative of a loss function with respect to a logit of a receiving neuron at the weight and training example. (A sum over training examples is taught by equation                         
                            
                                
                                    ∂
                                
                                
                                    ∂
                                    
                                        
                                            W
                                        
                                        
                                            i
                                            j
                                        
                                        
                                            
                                                
                                                    l
                                                
                                            
                                        
                                    
                                
                            
                            J
                            
                                
                                    W
                                    ,
                                    b
                                
                            
                        
                     on p. 7 at the bottom, where each pair (x,y) form a training example. A proposed change in the weight is the learning rate                         
                            α
                        
                     in the update formula for                         
                            
                                
                                    W
                                
                                
                                    i
                                    j
                                
                                
                                    
                                        
                                            l
                                        
                                    
                                
                            
                        
                     on p. 7. An output of a transmitting neuron is                         
                            
                                
                                    a
                                
                                
                                    j
                                
                                
                                    
                                        
                                            l
                                        
                                    
                                
                            
                        
                     in step 4 on p. 8. A first derivative of a loss function with respect to a logit is                         
                            
                                
                                    δ
                                
                                
                                    i
                                
                                
                                    
                                        
                                            
                                                
                                                    n
                                                
                                                
                                                    l
                                                
                                            
                                        
                                    
                                
                            
                        
                     in step 2 on p. 8, where z is a logit, as discussed on p. 4 below equation (5).)
	Ng is in the same field of endeavor as the claimed invention, namely, machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have performed backpropagation on Bellec’s neural network, with a motivation to train the network. (Ng, top of p. 6)

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Bellec et al. (“Deep Rewiring: Training very sparse deep networks”) in view of Baum et al. (US 20180285736 A1), and further in view of Kauschke et al. (“Batchwise Patching of Classifiers”).

	Regarding CLAIM 9, the combination of Bellec and Baum teaches: The computer-implemented method of claim 1, 
However, neither Bellec nor Baum explicitly teaches: further comprising: adding, by the one or more computing devices, a patch subnetwork to the machine- learned neural network, wherein the patch subnetwork is trained to predict an error associated with its input.
	But Kauschke teaches: further comprising: adding, by the one or more computing devices, a patch subnetwork to the machine learned neural network, wherein the patch subnetwork is trained to predict an error associated with its input. (P. 3375, § 2.2, steps (i) and (ii).)
	Kauschke is in the same field of endeavor as the claimed invention, namely, machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have added Kauschke’s patch classifier to Bellec’s neural network. A motivation for the combination is to find local patches to the global classification model that act in a flexible and efficient way without having to re-train the model from scratch. (Kauschke, P. 3374, col. 2, end of second paragraph)

Claims 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Bellec et al. (“Deep Rewiring: Training very sparse deep networks”) in view of Baum et al. (US 20180285736 A1), and further in view of Wang et al. (US Patent 11,200,495 B2). All references were cited in the PTO-892 filed 02/18/2022.

Regarding CLAIM 14, the combination of Bellec and Baum teaches: The computing system of claim 10, 
	However, neither Bellec nor Baum explicitly teaches: wherein selecting one or more edges for deletion based at least in part on the respective estimated utility of each of the plurality of edges 26comprises selecting a predetermined number of the plurality of edges that have the lowest estimated utilities.
But Wang teaches: wherein selecting one or more edges for deletion based at least in part on the respective estimated utility of each of the plurality of edges 26comprises selecting a predetermined number of the plurality of edges that have the lowest estimated utilities. (The BRI of a number includes a percentage. Wang teaches this limitation in C. 3, L. 63 to C. 4, L. 5)
	Wang is in the same field of endeavor as the claimed invention, namely network pruning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to have pruned a predetermined percentage of Bellec’s edges according to Wang’s method 100. A motivation for the combination is to clamp the weights of connections that are close to zero to zero. (Wang, C. 3, L. 64-65)

Regarding CLAIM 15, the combination of Bellec and Baum teaches: The computing system of claim 10, 
	However, neither Bellec nor Baum explicitly teaches: wherein selecting one or more edges for deletion based at least in part on the respective estimated utility of each of the plurality of edges comprises selecting a predetermined percentage of the plurality of edges that have the lowest estimated utilities.
But Wang teaches: wherein selecting one or more edges for deletion based at least in part on the respective estimated utility of each of the plurality of edges comprises selecting a predetermined percentage of the plurality of edges that have the lowest estimated utilities. (C. 3, L. 63 to C. 4, L. 5)
	Wang is in the same field of endeavor as the claimed invention, namely network pruning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to have pruned a predetermined percentage of Bellec’s edges according to Wang’s method 100. A motivation for the combination is to clamp the weights of connections that are close to zero to zero. (Wang, C. 3, L. 64-65)

Claim 20 are rejected under 35 U.S.C. 103 as being unpatentable over Baum et al. (US 20180285736 A1) in view of Sadowski (“Notes on Backpropagation”). All references were cited in the PTO-892 filed 02/18/2022.

	Regarding CLAIM 20, Baum teaches: The one or more non-transitory computer-readable media of claim 18, wherein determining the estimated change in utility of each edge to be quantized comprises determining, for each edge to be quantized, 
	However, Baum does not explicitly teach: determining a first derivative of a loss function with respect to a logit of a receiving neuron at the weight associated with the edge without determining any higher-order derivatives of the loss function.
But Saudowski teaches: determining a first derivative of a loss function with respect to a logit of a receiving neuron at the weight associated with the edge without determining any higher-order derivatives of the loss function. (P. 2, equation (11) teaches a first derivative of an error with respect to a logit                         
                            
                                
                                    s
                                
                                
                                    j
                                
                            
                        
                    .)
Sadowski is in the same field of endeavor as the claimed invention, namely, machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have performed backpropagation on Baum’s neural network, with a motivation to reduce the error of the network. (Sadowski, Abstract)

Response to Arguments
	Examiner herein responds to the interview held on 5/11/2022 and to Applicant’s remarks, claims, replacement drawing, and specification amendments filed 5/18/2022.

Claim Interpretation: Amended claim 1 no longer recites a contingent limitation which carries no patentable weight.

Drawing Objections (Remarks p. 9): The objection to the drawings has been withdrawn due to replacement Fig. 4A and the amendment to paragraph [0102].

Specification Objection (Remarks p. 9): The objections to paragraph [0091] has been withdrawn due to the amendment to this paragraph.

Claim Rejections Under 35 U.S.C. 101 (Remarks pp. 10-12): Applicant's arguments have been fully considered but they are not persuasive. 
Applicant’s Argument #1: Applicant argues that claim 1 does not recite any mathematical computation and cites the example analysis for Example 39 of the 2019 Patent Eligibility Guidance. 
Examiner’s Response #1: In claim 1, the steps of determining an estimated utility of an edge, approximating an effect of the edge on network performance, and determining to prune the edge are all mathematical operations, and they are not merely based on mathematical concepts. Determining an estimated utility of an edge may be performed by calculating an error between predicted and actual values. Approximating an effect of the edge consists of comparing any result values. Pruning an edge consists of zeroing its weight. Generating, based on the determining to prune at least one edge of the plurality of edges a sparse representation of the machine-learned neural network is generally linking the mental process of determining to the particular technological environment of machine learning.

Applicant’s Argument #2: Applicant argues “claim 1 is directed to an improvement in generating smaller machine-learned models in view of maintaining performance capabilities by making pruning determinations based in part on an estimated utility” and “the claimed technique provides a specific improvement by generating sparse representations of model(s) based on a more intelligent approach to pruning decisions.”
Examiner’s Response #2: Applicant’s alleged improvement of generating smaller machine-learned models is not explicitly recited by claim 1. The broadest reasonable interpretation of “sparse representation” does not have to be limited to a smaller representation.
The rejections of claims 1-7 and 9-20 are maintained. The rejection of claim 8 is moot because the claim is canceled.

Claim Rejections Under 35 U.S.C. 102 (Remarks pp. 12-13): Applicant's arguments have been fully considered but they are not persuasive.
Applicant’s Argument #1: Regarding claim 18, Applicant argues, “Baum only teaches a difference between an original output and a quantized output. See Baum at [0091]. This is not "a change in loss," for as discussed during the above-noted interview, the original output could be erroneous, and Baum makes no mention of any determination of whether the quantization causes the output to be more or less erroneous. Baum's so-called "quantization error" is merely a difference between two outputs per Equations (2) through (4), without any analysis of a loss based on either output. For at least this reason, Applicant respectfully submits that Baum fails to disclose each and every limitation of pending claim 18.”
Examiner’s Response #1: A loss is a difference between a ground truth and a calculated result. In Baum, the ground truth is the output using unquantized weights, as seen by the first sigma activation function in equations (2) and (4) in ¶ [0091], and the calculated result is the output using quantized weights, as seen by the second sigma activation function in equations (2) and (4). At p. 10, Baum’s Claim 3 recites “whereby said quantization error is utilized in periodically determining an updated quantization level”. This teaches iteratively applying quantization schemes to minimize the errors in equations (2) and (4). The purpose of applying a new quantization scheme is that the quantization error (i.e., the loss) is predicted to change. Therefore, Baum teaches “estimating a change in loss”. Additionally, Baum’s abstract states: “The system reduces quantization implications (i.e. error ) in a limited resource system by employing the information available in the data actually observed by the system.”

Applicant’s Argument #2: Regarding claim 18, Applicant argues, “Additionally, the Office Action alleges that Baum teaches that "estimating the change in loss for each proposed quantization scheme comprises determining an estimated change in utility of each edge to be quantized," as claimed in claim 18. But Baum at least completely fails to disclose "determining an estimated change in utility of each edge to be quantized." Baum merely discloses a calculation of a model output based on application of quantization globally to a set of weights, biases, or both. Nowhere does Baum determine "an estimated change in utility," for any edge, let alone for "each edge to be quantized." Merely using a quantized set of weights or biases to obtain an output completely fails to unambiguously disclose "determining an estimated change in utility of each edge to be quantized," as claimed in claim 18. For at least this reason, Applicant respectfully submits that Baum fails to disclose each and every limitation of pending claim 18.”
Examiner’s Response #2: As noted in Examiner’s interview summary, under the broadest sense of utility, making any change to an edge or weight indicates that the connection is either significant or insignificant. Examiner is not required to use the sense of “utility” from specification paragraph 27 while examining claim 18. Additionally, Claim 18 does not require Examiner to cite prior art that teaches determining a separate and distinct estimated change in utility for each edge/weight. Baum’s Claim 3 on p. 10 recites: “whereby said quantization error is utilized in periodically determining an updated quantization level”. Therefore, any edge in Baum’s neural network is a potential edge to be quantized. Changes to quantization levels are made to entire layers, as indicated by ¶ [0021], [0086], the last 3 lines of [0100] (step 185), and claim 4 (“at least one layer”).
Applicant is directed to Baum, paragraphs [0094]-[0104], which discloses monitoring the input values, output values, and weights in each neuron to optimize quantization levels. Paragraph [0102] in particular discloses: “Statistics are gathered using a set of counters that count the level of activity observed at the neurons at each layer… The activity monitored may include data input to the neuron, data output from the neuron, the internal weights , and any combination thereof.” A neuron with data counters is shown in Fig. 6 and paragraph [0089]. The utility of each edge may correspond to the counter statistics for the data input to the neuron, data output from the neuron, and/or the internal weights.
The rejections of claims 18-19 under 35 U.S.C. 102 are maintained and supported by the above clarifications.

Claim Rejections Under 35 U.S.C. 103 (Remarks pp. 13-15): Applicant's arguments have been fully considered but they are not persuasive.
Applicant’s Argument: Regarding Claim 1, Applicant argues the following: However, neither Bellec nor longstanding principles of machine learning teach or suggest an “estimated utility… determined by approximating an effect of the edge on network performance,” as claimed in amended claim 1, let alone “determining… to prune the edge based at least in part on the estimated utility of such edge.” For instance, Bellec merely applies a naïve heuristic, pruning any weight that has a negative value. Bellec applies this heuristic indiscriminately, without regard or consideration of any effect that such weight might have on the model quality. Thus, Applicant respectfully submits determining a mere sign of a weight does not provide any hint or suggestion of an “estimated utility… determined by approximating an effect of the edge on network performance,” as claimed in amended claim 1, let alone “determining ... to prune the edge based at least in part on the estimated utility of such edge.”
Examiner’s Response: The limitation “an estimated utility of the edge” is broad and does not specifically disclose what constitutes estimating utility of the edge or how the utility is estimated. The claim does not preclude Examiner from interpreting “determining… an estimated utility of an edge” as predicting that that the neural network error will decrease by removing an edge from the network. The claim does not preclude Examiner from interpreting “determining… to prune the edge” as acting upon the prediction by removing the edge from the network. The BRI of a utility of an edge is the importance of the edge for accurate inferencing, which is represented by Bellec’s weight sign. The rejection of claim 1 is maintained. Additionally, the rejection of claim 10 is maintained.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Asher H. Jablon whose telephone number is (571)270-7648. The examiner can normally be reached Monday - Friday, 9:00 am - 6:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Al Kawsar can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/A.H.J./Examiner, Art Unit 2127                                                                                                                                                                                                        
/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127