DETAILED ACTION
This Office Action is in response to the amendment for Application No. 16/195,973 filed on January 07, 2022. Claims 1-20 are presented for examination and are currently pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .



Response to Arguments
Applicant’s arguments, see page 12, with respect to the double patenting rejection have been fully considered and are persuasive. The double patenting rejection of October 7th, 2021 has been withdrawn. 
Applicant’s arguments, see page 12, with respect to the 112(b) rejection of claim 4 have been fully considered and are persuasive. The 112(b) rejection of October 7th, 2021 has been withdrawn. 
Applicant’s arguments, see pages 12-20, with respect to the 103 rejection of claims 1-6, 9-12, and 15-18 as unpatentable over Byun in view of Varadaranjan and claims 7, 8, 13, 14, 19, and 20 as unpatentable over Byun in view of Varadaranjan and Nachum are not persuasive. Applicant submits, on page 17 of remarks, that Byun fails to disclose a marginal contribution value determined for each neural node in a given layer with respect to other neural nodes in the given layer as recited in claim 1. Applicant also submits that Byun’s singular values listed in the diagonal matrix that denote the relative importance of eigen vectors cannot be equated with a marginal contribution value determined for each neural node in given layer with respect to other neural nodes in the given layer. The examiner respectfully disagrees. Byun’s method comprises using singular value decomposition to break down the weight matrix (equivalent to applicant’s weight matrix for neural nodes) into the three related matrices uΣv*. The Σ matrix storing diagonal values indicating relative importance (equivalent to the marginal contribution value with respect to other neural nodes in the given layer). The argument is non-persuasive. 
In regards to applicant’s argument that the prior art does not teach an output loss that is the calculated difference of the matrix before and after it has been modified, the argument is moot in light of a new rejection. Applicant’s amendments to claim 1 including the added limitations of “by removing each neural node with no or negligible marginal contribution value” and “and improving, by the ANN improvement device, performance of the ANN by activating proper neural nodes using the updated weights of remaining neural nodes in the given layer.” change the scope of claim 1, including the limitation that was inserted into claim 1 from dependent claim 3 which recites: “wherein determining the marginal contribution value of the given neural node in the given layer with respect to other neural nodes in the given layer comprises a difference between an output loss of the ANN for the input vector based on the weight matrix for the given layer and an output loss of the ANN for the input vector based on the modified weight matrix for the given laver with respect to the given neural node”. The added limitations change the scope of the claimed “neural node” with new limitations that determine more specifically what is done to said neural node(s). 
“determining, by the ANN improvement device, a distributed surplus value of a given remaining neural node in a given layer based on the marginal contribution values of a coalition of remaining neural nodes in the given layer and a number of remaining neural nodes in the given layer,” the examiner respectfully disagrees. Examiner had cited [ Varadarajan (0143) “Backpropagation entails distributing the error backward through the layers of the ANN in varying amounts to all of the connection edges” ]. Applicant’s specification recites that the surplus value of a given neural node can be used to update the weights of the remaining neural nodes in the given layer [ (0021) and (0022) ] and further goes on to talk about how the updated weights will be used for the activation of the neural nodes in that given layer [ (0035) and (0036) ]. Varadarajan’s backward propagation of error distribution is an equivalent to this function. [ Varadarajan (0131) “Each edge from a particular node to an activation neuron represents that the activation value of the particular neuron is an input to the activation neuron, that is, an input to the activation function of the activation neuron, as adjusted by the weight of the edge.” ] (Emphasis added). Varadarajan’s specification explains how this backward propagation goes through and distributes the value to the other nodes as to update the weights for each node as to change the activation value. Applicant is reminded that cited prior art references must be considered in their entirety and not only the cited sections [ MPEP 2141.02(VI) ]. Although Varadarajan does not explicitly teach the marginal contribution value, examiner notes that this concept/limitation is taught by the other prior art. The argument is not persuasive; please refer to the §103 rejection section to see the full mapping for each claim.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-6, 9-12, and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Byun (US 20190087729 A1) in view of Varadarajan (US 20190095819 A1) and (WO2018090706A1) hereinafter known as WO2018.
In regards to claim 1, Byun teaches the following:
A method of improving performance of an artificial neural network (ANN), the method comprising: for each layer of the ANN, generating, by an ANN improvement device, a weight matrix comprising a weight of each neural node in a given layer;

Byun teaches a CNN tuner, which would be the equivalent to the ANN improvement device. The citation also includes the weight matrix of the selected layer which is equivalent to the neural node weight(s) of that layer. ]
For each neural node in the given layer, determining by the ANN improvement device, a marginal contribution value of a given node in the given layer with respect to the other neural nodes in the given layer based on an input vector to the given layer
[ (¶0025) “the diagonal of the Σ matrix lists singular values that indicate the relative importance of eigenvectors stored in the u and v* matrices to the SVD representation of the weight matrix”
	This citation shows the diagonal of the matrix which indicates relative importance of eigenvectors. Examiner notes that this is the equivalent to the marginal contribution score. With the weight matrix at the end of the citation being equivalent to the nodes in the given layer. ]
and a modified weight matrix, wherein the modified weight matrix is derived from the weight matrix by replacing weight of the given neural node in the given layer with a predefined weight;
[ (¶0028) “the CNN tuner generates a new weight matrix by multiplying the truncated Σ matrix by the truncated v* matrix and replacing the weight matrix of the selected layer with the new weight matrix” 
The new weight matrix from the citation would be the equivalent to the modified weight matrix from the claim. ]
executing, by the ANN improvement device, an elimination decision for each neural node in each layer
[ (¶0029) “In these examples, the CNN tuner prunes the weight matrix by zeroing values less than or equal to the pruning threshold.” 

	However what is not taught by Byun is the following:
wherein determining the marginal contribution value of the given neural node in the given layer with respect to other neural nodes in the given layer comprises a difference between an output loss of the ANN for the input vector based on the weight matrix for the given layer and an output loss of the ANN for the input vector based on the modified weight matrix for the given laver with respect to the given neural node;
executing, by the ANN improvement device, an elimination decision for each neural node in each layer by removing each neural node with no or negligible marginal contribution value based on the corresponding marginal contribution value in order to reduce computation;
for each remaining neural node in each layer, determining, by the ANN improvement device, a distributed surplus value of a given remaining neural node in a given layer based on the marginal contribution values of a coalition of remaining neural nodes in the given layer and a number of remaining neural nodes in the given layer;
updating, by the ANN improvement device, the weight matrix based on the distributed surplus value of each remaining neural node in each layer;
and improving, by the ANN improvement device, performance of the ANN by activating proper neural nodes using the updated weights of remaining neural nodes in the given layer.
Varadarajan however does teach some of these limitations as seen below:
for each remaining neural node in each layer, determining, by the ANN improvement device, a distributed surplus value of a given remaining neural node in a given layer based on the marginal contribution values of a coalition of remaining neural nodes in the given layer and a number of remaining neural nodes in the given layer;
[ (¶0143) “Propagation of error causes adjustments to edge weights, which depends on the gradient of the error at each edge.”
This citation by Varadarajan teaches distributing a correction value through propagation to the other nodes in the layer which will change the activation function/value for each neuron, through the now modified weight matrix. Examiner notes that the error adjustment is equivalent to the distributed surplus value as it also makes changes through the weight matrix to change the activation function. Examiner notes the marginal contribution value was taught in a previous citation. ]
updating, by the ANN improvement device, the weight matrix based on the distributed surplus value of each remaining neural node in each layer.
[ (¶0143) “An edge weight is adjusted according to a percentage of the edge's gradient.” 
This citation teaches updating the weight matrix by adjusting the edge weight. The edge’s gradient is equivalent to the relation of the remaining neural nodes in each layer as the gradient is calculated by the relation of edge error to activation value of an upstream neuron. With the distributed surplus value being equivalent to the adjustment of the edge weight as taught in the above citation. ]
Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a CNN tuning system and method(s) for using it as taught from Byun with the methods of distributing a corrected values from removed nodes to the remaining nodes as taught by Varadarajan. One of ordinary skill in the art, before the effective filing date of the claimed invention, would have found it obvious because the distribution of the error value to other nodes in the layer would decrease the error rate of the neural network [ Varadarajan (0143) ]. This has the obvious benefit of a more accurate system.
What Byun and Varadarajan both fail to explicitly teach is the following:
wherein determining the marginal contribution value of the given neural node in the given layer with respect to other neural nodes in the given layer comprises a difference between an output loss of the ANN for the input vector based on the weight matrix for the given layer and an output loss of the ANN for the input vector based on the modified weight matrix for the given laver with respect to the given neural node;
executing, by the ANN improvement device, an elimination decision for each neural node in each layer by removing each neural node with no or negligible marginal contribution value based on the corresponding marginal contribution value in order to reduce computation;
and improving, by the ANN improvement device, performance of the ANN by activating proper neural nodes using the updated weights of remaining neural nodes in the given layer.
WO2018 however does teach these limitations as seen below:
wherein determining the marginal contribution value of the given neural node in the given layer with respect to other neural nodes in the given layer comprises a difference between an output loss of the ANN for the input vector based on the weight matrix for the given layer and an output loss of the ANN for the input vector based on the modified weight matrix for the given laver with respect to the given neural node;
[ (Pg. 4, Paragraph 22) “The clipped neurons are neurons that contribute weakly to the neural network output and have poor expression ability. Therefore, the neural network after pruning and the pre-pruning Compared with the neural network, not only the compression and acceleration effects are obtained, but also the precision loss is small compared with that before the pruning.”
	This paragraph from WO2018 teaches the neural network comparing the neurons pre-pruning to determine what the accuracy would be post-pruning and making a decision based off of that. It also states that it finds an “importance value” which includes its relation to other nodes 
executing, by the ANN improvement device, an elimination decision for each neural node in each layer by removing each neural node with no or negligible marginal contribution value based on the corresponding marginal contribution value in order to reduce computation;
[ (Pg. 4, Paragraph 22) “the embodiment of the present invention firstly determines, according to the activation value of the neuron, the importance value of each neuron in the network layer to be pruned”
	This citation teaches the neural network going through each neural node and looking at its importance value (equivalent to marginal contribution value) before determining whether it should be pruned or not which would be equivalent to the elimination decision. ]
and improving, by the ANN improvement device, performance of the ANN by activating proper neural nodes using the updated weights of remaining neural nodes in the given layer.
[ (Pg. 8, Paragraph 11) “The weight of the connection between the neurons in the pruning network layer and the neurons in the next network layer is adjusted.”
	This citation teaches the weightings of the neural nodes being adjusted after pruning. ]
	Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a CNN tuning system and method(s) for using it as taught from Byun with the methods for improved neuron pruning as taught by WO2018. One of ordinary skill in the art, before the effective filing date of the claimed invention, would have found it obvious because the enhanced pruning methods would decrease the error rate of the neural network while also enhancing acceleration and compression [ WO2018 (Pg. 4, Paragraph 22) ]. This has the obvious benefit of a more accurate system which would also be faster and utilize less resources.
10.	Regarding claim 2, The method of claim 1, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 1 above. With the rest of the claim being taught by Byun as seen below.
Wherein generating the weight matrix comprises building and training the ANN for a specific application
[ (¶0001) “CNNs are currently used, for example, to accurately detect and classify objects depicted in images and words recited in recordings” 
This citation explicitly teaches using the ANN for a specific application. ]

11.	Regarding claim 3, The method of claim 1, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 1 above. With the rest of the claim being taught by WO2018 as seen below.
wherein the output loss of the ANN based on the weight matrix for the given layer is a difference between the expected output vector of the ANN and an actual output vector of the ANN based on the weight matrix for the given layer
[ (Pg. 4, Paragraph 22) “The neural network pruning method provided by the embodiment of the present invention firstly determines, according to the activation value of the neuron, the
importance value of each neuron in the network layer to be pruned, and according to the neuron and the next network layer. “…“Therefore, the neural network after pruning and the pre-pruning Compared with the neural network, not only the compression and acceleration effects are obtained, but also the precision loss is small compared with that before the pruning.”
This citation by WO2018 teaches the pruning method checking to see the difference in output pre-pruning and post-pruning within the layer as a whole and the node in particular to see if the pruning decision should be made or not. ]
and wherein the output loss of the ANN based on the modified weight matrix for the given layer with respect to the given neural node is a difference between the expected output vector of the ANN and an actual output vector of the ANN based on the modified weight matrix for the given layer with respect to the given neural node.
[ (Pg. 4, Paragraph 22) “The neural network pruning method provided by the embodiment of the present invention firstly determines, according to the activation value of the neuron, the
importance value of each neuron in the network layer to be pruned, and according to the neuron and the next network layer. “… “Therefore, the neural network after pruning and the pre-pruning Compared with the neural network, not only the compression and acceleration effects are obtained, but also the precision loss is small compared with that before the pruning.”
This citation by WO2018 teaches the pruning method checking to see the difference in output pre-pruning and post-pruning within the layer as a whole and the node in particular to see if the pruning decision should be made or not. ]
Regarding claim 4, The method of claim 1, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 1 above. With the rest of the claim being taught by Byun as seen below.
Wherein the predefined weight in the modified weight matrix is zero
[ (¶0016) “These CNN tuning processes remove unnecessary ranks in CNN tensors (e.g., weight matrices) and also prune remaining near zero weights”  
This shows that the tuning device would be able to convert the matrix to a modified matrix with weight consisting of zero for the affected nodes/neurons. ]

Regarding claim 5, The method of claim 1, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 1 above. With the rest of the claim being taught by Byun as seen below.
Wherein executing the elimination decision comprises removing a given neural node which the corresponding marginal contribution value is less than an adaptive threshold value
[ (¶0029) “For instance, in some examples, the CNN tuner prunes the weight matrix of the selected layer using a predefined and configurable pruning ratio. In some examples, the pruning ratio is expressed as a percentage of the number of weight values” 
This teaches that the tuner will remove (prune) a layer or node as needed if whatever value is selected (which could be contribution value) is lower than the threshold set by the user.]
Regarding claim 6, The method of claim 5, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 5 above. With the rest of the claim being taught by Byun as seen below.
Wherein, the adaptive threshold value is determined by the ANN improvement device based on the marginal contribution value of each neural node in the given layer
[ (¶0029) “For instance, in some examples, the CNN tuner prunes the weight matrix of the selected layer using a predefined and configurable pruning ratio. In some examples, the pruning ratio is expressed as a percentage of the number of weight values” 

In regards to claim 9, Byun teaches the following:
A system for improving performance of an artificial neural network (ANN), the system comprising: An ANN improvement device comprising at least one processor and computer readable medium storing instructions
[(¶0010) “The systems and methods disclosed herein tune a CNN to increase both its accuracy and computational efficiency.” ] 
[ (¶0017) “the computing device 100 includes a processor 102, memory” ]	That, when executed by the at least one processor, cause the at least one processor to perform operations comprising:
[ (¶0017) “and register memory, that can execute instructions defined by an instruction set.” ]
For each layer of the ANN, generating a weight matrix comprising a weight of each neural node in a given layer
[ (¶0028) “the CNN tuner generates a new weight matrix by multiplying the truncated Σ matrix by the truncated v* matrix and replacing the weight matrix of the selected layer with the new weight matrix”
Byun teaches a CNN tuner, which would be the equivalent to the ANN improvement device, that generates a new weight matrix for each layer of the CNN which means it comprises the weight of each neural node in that layer.]
For each neural node in the given layer, determining a marginal contribution value of a given node in the given layer with respect to the other neural nodes in the given layer based on an input vector to the given layer
[ (¶0025) “the diagonal of the Σ matrix lists singular values that indicate the relative importance of eigenvectors stored in the u and v* matrices to the SVD representation of the weight matrix”
	This citation shows the diagonal of the matrix which indicates relative importance of eigenvectors. Examiner notes that this is the equivalent to the marginal contribution score. With the weight matrix at the end of the citation being equivalent to the nodes in the given layer. ]
and a modified weight matrix, wherein the modified weight matrix is derived from the weight matrix by replacing weight of the given neural node in the given layer with a predefined weight;
[ (¶0028) “the CNN tuner generates a new weight matrix by multiplying the truncated Σ matrix by the truncated v* matrix and replacing the weight matrix of the selected layer with the new weight matrix” 
The new weight matrix from the citation would be the equivalent to the modified weight matrix from the claim. ]
executing, by the ANN improvement device, an elimination decision for each neural node in each layer
[ (¶0029) “In these examples, the CNN tuner prunes the weight matrix by zeroing values less than or equal to the pruning threshold.” 
Pruning the weight matrix by zeroing values is the equivalent to making an elimination decision on a particular neural node. Additionally, the pruning threshold could be any value or target that the user would want to set which could be the marginal contribution value. ]
	However what is not taught by Byun is the following:
wherein determining the marginal contribution value of the given neural node in the given layer with respect to other neural nodes in the given layer comprises a difference between an output loss of the ANN for the input vector based on the weight matrix for the given layer and an output loss of the ANN for the input vector based on the modified weight matrix for the given laver with respect to the given neural node;
executing, by the ANN improvement device, an elimination decision for each neural node in each layer by removing each neural node with no or negligible marginal contribution value based on the corresponding marginal contribution value in order to reduce computation;
for each remaining neural node in each layer, determining, by the ANN improvement device, a distributed surplus value of a given remaining neural node in a given layer based on the marginal contribution values of a coalition of remaining neural nodes in the given layer and a number of remaining neural nodes in the given layer;
updating, by the ANN improvement device, the weight matrix based on the distributed surplus value of each remaining neural node in each layer;
and improving, by the ANN improvement device, performance of the ANN by activating proper neural nodes using the updated weights of remaining neural nodes in the given layer.
Varadarajan however does teach some of these limitations as seen below:
for each remaining neural node in each layer, determining, by the ANN improvement device, a distributed surplus value of a given remaining neural node in a given layer based on the marginal contribution values of a coalition of remaining neural nodes in the given layer and a number of remaining neural nodes in the given layer;
[ (¶0143) “Propagation of error causes adjustments to edge weights, which depends on the gradient of the error at each edge.”
This citation by Varadarajan teaches distributing a correction value through propagation to the other nodes in the layer which will change the activation function/value for each neuron, through the now modified weight matrix. Examiner notes that the error adjustment is equivalent 
updating, by the ANN improvement device, the weight matrix based on the distributed surplus value of each remaining neural node in each layer.
[ (¶0143) “An edge weight is adjusted according to a percentage of the edge's gradient.” 
This citation teaches updating the weight matrix by adjusting the edge weight. The edge’s gradient is equivalent to the relation of the remaining neural nodes in each layer as the gradient is calculated by the relation of edge error to activation value of an upstream neuron. With the distributed surplus value being equivalent to the adjustment of the edge weight as taught in the above citation. ]
Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a CNN tuning system and method(s) for using it as taught from Byun with the system for enhancing a CNN which includes methods of distributing a corrected values from removed nodes to the remaining nodes as taught by Varadarajan. One of ordinary skill in the art, before the effective filing date of the claimed invention, would have found it obvious because the distribution of the error value to other nodes in the layer would decrease the error rate of the neural network [ Varadarajan (0143) ]. This has the obvious benefit of a more accurate system.
What Byun and Varadarajan both fail to explicitly teach is the following:
wherein determining the marginal contribution value of the given neural node in the given layer with respect to other neural nodes in the given layer comprises a difference between an output loss of the ANN for the input vector based on the weight matrix for the given layer and an output loss of the ANN for the input vector based on the modified weight matrix for the given laver with respect to the given neural node;
executing, by the ANN improvement device, an elimination decision for each neural node in each layer by removing each neural node with no or negligible marginal contribution value based on the corresponding marginal contribution value in order to reduce computation;
and improving, by the ANN improvement device, performance of the ANN by activating proper neural nodes using the updated weights of remaining neural nodes in the given layer.
WO2018 however does teach these limitations as seen below:
wherein determining the marginal contribution value of the given neural node in the given layer with respect to other neural nodes in the given layer comprises a difference between an output loss of the ANN for the input vector based on the weight matrix for the given layer and an output loss of the ANN for the input vector based on the modified weight matrix for the given laver with respect to the given neural node;
[ (Pg. 4, Paragraph 22) “The clipped neurons are neurons that contribute weakly to the neural network output and have poor expression ability. Therefore, the neural network after pruning and the pre-pruning Compared with the neural network, not only the compression and acceleration effects are obtained, but also the precision loss is small compared with that before the pruning.”
	This paragraph from WO2018 teaches the neural network comparing the neurons pre-pruning to determine what the accuracy would be post-pruning and making a decision based off of that. It also states that it finds an “importance value” which includes its relation to other nodes in the same layer and nodes within the next layer. The importance value being equivalent to the marginal contribution value ]
executing, by the ANN improvement device, an elimination decision for each neural node in each layer by removing each neural node with no or negligible marginal contribution value based on the corresponding marginal contribution value in order to reduce computation;
[ (Pg. 4, Paragraph 22) “the embodiment of the present invention firstly determines, according to the activation value of the neuron, the importance value of each neuron in the network layer to be pruned”
	This citation teaches the neural network going through each neural node and looking at its importance value (equivalent to marginal contribution value) before determining whether it should be pruned or not which would be equivalent to the elimination decision. ]
and improving, by the ANN improvement device, performance of the ANN by activating proper neural nodes using the updated weights of remaining neural nodes in the given layer.
[ (Pg. 8, Paragraph 11) “The weight of the connection between the neurons in the pruning network layer and the neurons in the next network layer is adjusted.”
	This citation teaches the weightings of the neural nodes being adjusted after pruning. ]
	Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a CNN tuning system and method(s) for using it as taught from Byun with the system for improved neuron pruning as taught by WO2018. One of ordinary skill in the art, before the effective filing date of the claimed invention, would have found it obvious because the enhanced pruning methods would decrease the error rate of the neural network while also enhancing acceleration and compression [ WO2018 (Pg. 4, Paragraph 22) ]. This has the obvious benefit of a more accurate system which would also be faster and utilize less resources.

Regarding claim 10, The system of claim 9, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 9 above. With the rest of the claim being taught by WO2018 as seen below.
wherein the output loss of the ANN based on the weight matrix for the given layer is a difference between the expected output vector of the ANN and an actual output vector of the ANN based on the weight matrix for the given layer
[ (Pg. 4, Paragraph 22) “The neural network pruning method provided by the embodiment of the present invention firstly determines, according to the activation value of the neuron, the
importance value of each neuron in the network layer to be pruned, and according to the neuron and the next network layer. “…“Therefore, the neural network after pruning and the pre-pruning Compared with the neural network, not only the compression and acceleration effects are obtained, but also the precision loss is small compared with that before the pruning.”
This citation by WO2018 teaches the pruning method checking to see the difference in output pre-pruning and post-pruning within the layer as a whole and the node in particular to see if the pruning decision should be made or not. ]
	and wherein the output loss of the ANN based on the modified weight matrix for the given layer with respect to the given neural node is a difference between the expected output vector of the ANN and an actual output vector of the ANN based on the modified weight matrix for the given layer with respect to the given neural node.
[ (Pg. 4, Paragraph 22) “The neural network pruning method provided by the embodiment of the present invention firstly determines, according to the activation value of the neuron, the
importance value of each neuron in the network layer to be pruned, and according to the neuron and the next network layer. “… “Therefore, the neural network after pruning and the pre-pruning Compared with the neural network, not only the compression and acceleration effects are obtained, but also the precision loss is small compared with that before the pruning.”

Regarding claim 11, The system of claim 9, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 9 above. With the rest of the claim being taught by Byun as seen below.
Wherein executing the elimination decision comprises removing a given neural node which the corresponding marginal contribution value is less than an adaptive threshold value
[ (¶0029) “For instance, in some examples, the CNN tuner prunes the weight matrix of the selected layer using a predefined and configurable pruning ratio. In some examples, the pruning ratio is expressed as a percentage of the number of weight values” 
This teaches that the tuner will remove (prune) a layer or node as needed if whatever value is selected (which could be the contribution value) is lower than the threshold set by the user.]
Regarding claim 12, The system of claim 11, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 11 above. With the rest of the claim being taught by Byun as seen below.
Wherein, the adaptive threshold value is determined by the ANN improvement device based on the marginal contribution value of each neural node in the given layer, and wherein removing the given neural node comprises defining the weight of the given neural node as zero.
[ (¶0029) “For instance, in some examples, the CNN tuner prunes the weight matrix of the selected layer using a predefined and configurable pruning ratio. In some examples, the pruning ratio is expressed as a percentage of the number of weight values” 
This citation teaches that the tuner will decide what the layer or node threshold is as needed if whatever value is selected (which could be the contribution value) is the one set by the user.]
In regards to claim 15, Byun teaches the following:
A non-transitory computer-readable medium storing computer-executable instructions for improving performance of an artificial neural network (ANN), the computer-executable instructions configured for: 
 [ (¶0075) “non-transient computer readable medium encoded with instructions that when executed by at least one processor cause a process for tuning a convolutional neural network” ]
For each layer of the ANN, generating a weight matrix comprising a weight of each neural node in a given layer
[ (¶0028) “the CNN tuner generates a new weight matrix by multiplying the truncated Σ matrix by the truncated v* matrix and replacing the weight matrix of the selected layer with the new weight matrix”
Byun teaches a CNN tuner, which would be the equivalent to the ANN improvement device, that generates a new weight matrix for each layer of the CNN which means it comprises the weight of each neural node in that layer.]
For each neural node in the given layer, determining a marginal contribution value of a given node in the given layer with respect to the other neural nodes in the given layer based on an input vector to the given layer
[ (¶0025) “the diagonal of the Σ matrix lists singular values that indicate the relative importance of eigenvectors stored in the u and v* matrices to the SVD representation of the weight matrix”
	This citation shows the diagonal of the matrix which indicates relative importance of eigenvectors. Examiner notes that this is the equivalent to the marginal contribution score. With the weight matrix at the end of the citation being equivalent to the nodes in the given layer. ]
and a modified weight matrix, wherein the modified weight matrix is derived from the weight matrix by replacing weight of the given neural node in the given layer with a predefined weight;
[ (¶0028) “the CNN tuner generates a new weight matrix by multiplying the truncated Σ matrix by the truncated v* matrix and replacing the weight matrix of the selected layer with the new weight matrix” 
The new weight matrix from the citation would be the equivalent to the modified weight matrix from the claim. ]
executing, by the ANN improvement device, an elimination decision for each neural node in each layer
[ (¶0029) “In these examples, the CNN tuner prunes the weight matrix by zeroing values less than or equal to the pruning threshold.” 
Pruning the weight matrix by zeroing values is the equivalent to making an elimination decision on a particular neural node. Additionally, the pruning threshold could be any value or target that the user would want to set which could be the marginal contribution value. ]
	However what is not taught by Byun is the following:
wherein determining the marginal contribution value of the given neural node in the given layer with respect to other neural nodes in the given layer comprises a difference between an output loss of the ANN for the input vector based on the weight matrix for the given layer and an output loss of the ANN for the input vector based on the modified weight matrix for the given laver with respect to the given neural node;
executing, by the ANN improvement device, an elimination decision for each neural node in each layer by removing each neural node with no or negligible marginal contribution value based on the corresponding marginal contribution value in order to reduce computation;
for each remaining neural node in each layer, determining, by the ANN improvement device, a distributed surplus value of a given remaining neural node in a given layer based on the marginal contribution values of a coalition of remaining neural nodes in the given layer and a number of remaining neural nodes in the given layer;
updating, by the ANN improvement device, the weight matrix based on the distributed surplus value of each remaining neural node in each layer;
and improving, by the ANN improvement device, performance of the ANN by activating proper neural nodes using the updated weights of remaining neural nodes in the given layer.
Varadarajan however does teach some of these limitations as seen below:
for each remaining neural node in each layer, determining, by the ANN improvement device, a distributed surplus value of a given remaining neural node in a given layer based on the marginal contribution values of a coalition of remaining neural nodes in the given layer and a number of remaining neural nodes in the given layer;
[ (¶0143) “Propagation of error causes adjustments to edge weights, which depends on the gradient of the error at each edge.”
This citation by Varadarajan teaches distributing a correction value through propagation to the other nodes in the layer which will change the activation function/value for each neuron, through the now modified weight matrix. Examiner notes that the error adjustment is equivalent 
updating, by the ANN improvement device, the weight matrix based on the distributed surplus value of each remaining neural node in each layer.
[ (¶0143) “An edge weight is adjusted according to a percentage of the edge's gradient.” 
This citation teaches updating the weight matrix by adjusting the edge weight. The edge’s gradient is equivalent to the relation of the remaining neural nodes in each layer as the gradient is calculated by the relation of edge error to activation value of an upstream neuron. With the distributed surplus value being equivalent to the adjustment of the edge weight as taught in the above citation. ]
Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a CNN tuning system and method(s) for using it as taught from Byun with the system for enhancing a CNN which includes methods of distributing a corrected values from removed nodes to the remaining nodes as taught by Varadarajan. One of ordinary skill in the art, before the effective filing date of the claimed invention, would have found it obvious because the distribution of the error value to other nodes in the layer would decrease the error rate of the neural network [ Varadarajan (0143) ]. This has the obvious benefit of a more accurate system.
What Byun and Varadarajan both fail to explicitly teach is the following:
wherein determining the marginal contribution value of the given neural node in the given layer with respect to other neural nodes in the given layer comprises a difference between an output loss of the ANN for the input vector based on the weight matrix for the given layer and an output loss of the ANN for the input vector based on the modified weight matrix for the given laver with respect to the given neural node;
executing, by the ANN improvement device, an elimination decision for each neural node in each layer by removing each neural node with no or negligible marginal contribution value based on the corresponding marginal contribution value in order to reduce computation;
and improving, by the ANN improvement device, performance of the ANN by activating proper neural nodes using the updated weights of remaining neural nodes in the given layer.
WO2018 however does teach these limitations as seen below:
wherein determining the marginal contribution value of the given neural node in the given layer with respect to other neural nodes in the given layer comprises a difference between an output loss of the ANN for the input vector based on the weight matrix for the given layer and an output loss of the ANN for the input vector based on the modified weight matrix for the given laver with respect to the given neural node;
[ (Pg. 4, Paragraph 22) “The clipped neurons are neurons that contribute weakly to the neural network output and have poor expression ability. Therefore, the neural network after pruning and the pre-pruning Compared with the neural network, not only the compression and acceleration effects are obtained, but also the precision loss is small compared with that before the pruning.”
	This paragraph from WO2018 teaches the neural network comparing the neurons pre-pruning to determine what the accuracy would be post-pruning and making a decision based off of that. It also states that it finds an “importance value” which includes its relation to other nodes in the same layer and nodes within the next layer. The importance value being equivalent to the marginal contribution value ]
executing, by the ANN improvement device, an elimination decision for each neural node in each layer by removing each neural node with no or negligible marginal contribution value based on the corresponding marginal contribution value in order to reduce computation;
[ (Pg. 4, Paragraph 22) “the embodiment of the present invention firstly determines, according to the activation value of the neuron, the importance value of each neuron in the network layer to be pruned”
	This citation teaches the neural network going through each neural node and looking at its importance value (equivalent to marginal contribution value) before determining whether it should be pruned or not which would be equivalent to the elimination decision. ]
and improving, by the ANN improvement device, performance of the ANN by activating proper neural nodes using the updated weights of remaining neural nodes in the given layer.
[ (Pg. 8, Paragraph 11) “The weight of the connection between the neurons in the pruning network layer and the neurons in the next network layer is adjusted.”
	This citation teaches the weightings of the neural nodes being adjusted after pruning. ]
	Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a CNN tuning system and method(s) for using it as taught from Byun with the system for improved neuron pruning as taught by WO2018. One of ordinary skill in the art, before the effective filing date of the claimed invention, would have found it obvious because the enhanced pruning methods would decrease the error rate of the neural network while also enhancing acceleration and compression [ WO2018 (Pg. 4, Paragraph 22) ]. This has the obvious benefit of a more accurate system which would also be faster and utilize less resources.



Regarding claim 16, The non-transitory computer-readable medium of claim 15, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 15 above. With the rest of the claim being taught by WO2018 as seen below.
wherein the output loss of the ANN based on the weight matrix for the given layer is a difference between the expected output vector of the ANN and an actual output vector of the ANN based on the weight matrix for the given layer
[ (Pg. 4, Paragraph 22) “The neural network pruning method provided by the embodiment of the present invention firstly determines, according to the activation value of the neuron, the
importance value of each neuron in the network layer to be pruned, and according to the neuron and the next network layer. “…“Therefore, the neural network after pruning and the pre-pruning Compared with the neural network, not only the compression and acceleration effects are obtained, but also the precision loss is small compared with that before the pruning.”
This citation by WO2018 teaches the pruning method checking to see the difference in output pre-pruning and post-pruning within the layer as a whole and the node in particular to see if the pruning decision should be made or not. ]
	and wherein the output loss of the ANN based on the modified weight matrix for the given layer with respect to the given neural node is a difference between the expected output vector of the ANN and an actual output vector of the ANN based on the modified weight matrix for the given layer with respect to the given neural node.
[ (Pg. 4, Paragraph 22) “The neural network pruning method provided by the embodiment of the present invention firstly determines, according to the activation value of the neuron, the
importance value of each neuron in the network layer to be pruned, and according to the neuron and the next network layer. “… “Therefore, the neural network after pruning and the pre-pruning Compared with the neural network, not only the compression and acceleration effects are obtained, but also the precision loss is small compared with that before the pruning.”


Regarding claim 17, The non-transitory computer-readable medium of claim 15, is taught by Byun/Varadarajan as in the rejection for claim 15 above. With the rest of the claim being taught by Byun as seen below.
Wherein executing the elimination decision comprises removing a given neural node which the corresponding marginal contribution value is less than an adaptive threshold value
[ (0029) “For instance, in some examples, the CNN tuner prunes the weight matrix of the selected layer using a predefined and configurable pruning ratio. In some examples, the pruning ratio is expressed as a percentage of the number of weight values” 
This teaches that the tuner will remove (prune) a layer or node as needed if whatever value is selected (which could be contribution value) is lower than the threshold set by the user.]
Regarding claim 18, The non-transitory computer-readable medium of claim 17, is taught by Byun/Varadarajan as in the rejection for claim 17 above. With the rest of the claim being taught by Byun as seen below.
Wherein, the adaptive threshold value is determined by the ANN improvement device based on the marginal contribution value of each neural node in the given layer, and wherein removing the given neural node comprises defining the weight of the given neural node as zero.
[ (¶0029) “For instance, in some examples, the CNN tuner prunes the weight matrix of the selected layer using a predefined and configurable pruning ratio. In some examples, the pruning ratio is expressed as a percentage of the number of weight values” 
This citation teaches that the tuner will decide what the layer or node threshold is as needed if whatever value is selected (which could be the contribution value) is the one set by the user. ]

Claims 7, 8, 13, 14, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Byun/Varadarajan/WO2018 in view of Nachum (US 20190147339 A1).
In regards to claim 7, The method of claim 1, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 1 above. With the rest of the claim being taught by Nachum (US 20190147339 A1) as seen below.
Wherein the distributed surplus value of the given remaining neural node comprises an average marginal contribution value of the coalition of remaining neural nodes in the given layer
[ (¶0006) “In some implementations, the terms of the shrinking engine loss function that penalize active neurons of the neural network comprise: a group lasso regularization term, wherein each group comprises the input weights of a neuron of the neural network.” 

	Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a CNN tuning system and method(s) for using it as taught by Byun/Varadarajan/WO2018 and a marginal contribution value consisting of a coalition of remaining neural nodes in the given layer as taught by Nachum. One of ordinary skill in the art, before the effective filing date of the claimed invention, would have found it obvious because it would provide a superior prediction accuracy [ Nachum (0027) ]. Which has the obvious benefit of a more accurate model overall.


In regards to claim 8, The method of claim 1, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 1 above. With the rest of the claim being taught by Nachum as seen below.
Wherein updating the weight matrix comprises replacing the original weight of each remaining neural node in each layer with a corresponding distributed surplus value
[ (¶0009) “In some implementations, each of the terms of the shrinking engine loss function that penalize active neurons of the neural network correspond to a different neuron of the neural network; and each of the terms of the shrinking engine loss function that penalize active neurons is weighted by a different factor that depends on a number of operations induced by a neuron corresponding to the term.”
This citation from Nachum teaches updating and/or replacing a weight matrix in the neural node with values that could correspond from the other neural node(s). The citation talks about active neurons that are not pruned and/or removed and correlates them to other neurons 

In regards to claim 13, The system of claim 9, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 9 above. With the rest of the claim being taught by Nachum as seen below.
Wherein the distributed surplus value of the given remaining neural node comprises an average marginal contribution value of the coalition of remaining neural nodes in the given layer
[ (¶0006) “In some implementations, the terms of the shrinking engine loss function that penalize active neurons of the neural network comprise: a group lasso regularization term, wherein each group comprises the input weights of a neuron of the neural network.” 
This citation teaches a group lasso regularization term which is an average of the input weights for the group of neurons which can change the weightings of a neuron based on its discrepancy of its weightings in comparison to the group’s weighting.]
	With respect to Claim 13, it is substantially similar to Claim 7 and is rejected in the same manner, the same art and reasoning applying. Please refer to claim 7 to see the motivation to combine. 

In regards to claim 14, The system of claim 9, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 9 above. With the rest of the claim being taught by Nachum as seen below.
Wherein updating the weight matrix comprises replacing the original weight of each remaining neural node in each layer with a corresponding distributed surplus value
[ (¶0009) “In some implementations, each of the terms of the shrinking engine loss function that penalize active neurons of the neural network correspond to a different neuron of the neural network; and each of the terms of the shrinking engine loss function that penalize active neurons is weighted by a different factor that depends on a number of operations induced by a neuron corresponding to the term.”
This citation from Nachum teaches updating and/or replacing a weight matrix in the neural node with values that could correspond from the other neural node. The citation talks about active neurons that are not pruned and/or removed and correlates them to other neurons in the network and explicitly teaches the different weighted factors which is an equivalent to distributed surplus value. ]



In regards to claim 19, The non-transitory computer-readable medium of claim 15, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 9 above. With the rest of the claim being taught by Nachum as seen below.
Wherein updating the weight matrix comprises replacing the original weight of each remaining neural node in each layer with a corresponding distributed surplus value
[ (0009) “In some implementations, each of the terms of the shrinking engine loss function that penalize active neurons of the neural network correspond to a different neuron of the neural network; and each of the terms of the shrinking engine loss function that penalize active 
This citation from Nachum teaches updating and/or replacing a weight matrix in the neural node with values that could correspond from the other neural node. The citation talks about active neurons that are not pruned and/or removed and correlates them to other neurons in the network and explicitly teaches the different weighted factors which is an equivalent to distributed surplus value. ]


In regards to claim 20, The non-transitory computer-readable medium of claim 15, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 15 above. With the rest of the claim being taught by Nachum as seen below.
Wherein the distributed surplus value of the given remaining neural node comprises an average marginal contribution value of the coalition of remaining neural nodes in the given layer
[ (¶0006) “In some implementations, the terms of the shrinking engine loss function that penalize active neurons of the neural network comprise: a group lasso regularization term, wherein each group comprises the input weights of a neuron of the neural network.” 
This citation teaches a group lasso regularization term which is an average of the input weights for the group of neurons which can change the weightings of a neuron based on its discrepancy of its weightings in comparison to the group’s weighting.]
With respect to Claim 20, it is substantially similar to Claim 7 and is rejected in the same manner, the same art and reasoning applying. Please refer to claim 7 to see the motivation to combine.
 



Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US 11227213 B2 – Device and method for improving processing speed of neural network that teaches a loss score for parameters, retraining after matrix changes and zeroing of matrix values. 




THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL MERABI whose telephone number is (571)272-9685. The examiner can normally be reached Mon-Fri 7:30am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on (571) 270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/M.A.M./Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123