DETAILED ACTION
This Office Action is in response to the amendment for Application No. 16/195,973 filed on April 27, 2022. Claims 1-20 are presented for examination and are currently pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Response to Arguments
Applicant’s arguments, on Pg. 2 of “Remarks” recite that the combination of Byun, Varadarajan, and WO2018 fail individually or in combination to teach the claim limitation that recites: “identifying, by the ANN improvement device, one or more neural nodes with no or negligible marginal contribution value based on the corresponding marginal contribution value in order to reduce computation; executing, by the ANN improvement device, an elimination decision for each neural node in each layer by removing the identified one or more neural nodes”. The examiner respectfully disagrees.
Examiner points to [ WO2018 (Pg. 4, Paragraph 22) “Therefore, the neuron pair selected by the largest neuron selection strategy is selected. The output of the neural network has a strong contribution and expression ability. The clipped neurons are neurons that contribute weakly to the neural network output and have poor expression ability. Therefore, the neural network after pruning and the pre-pruning Compared with the neural network, not only the compression and acceleration effects are obtained, but also the precision loss is small compared with that before the pruning”]. This citation and the rest of the teachings from WO2018 explain that a decision is made from the improvement device where the neurons have their loss measured and then are pruned based on which neurons would provide the weakest contribution to the neural network output. This is equivalent to the claim citation with the system picking the nodes that have the least or negligible contribution (importance value of the reference) to be pruned from the network. Further, the actual elimination is evidenced in WO2018 with: [ WO2018 (Pg. 9, Line 38) “The pruning unit 84 is configured to cut off other neurons in the network layer to be pruned to obtain a pruning network layer” ]. This citation shows the pruning operation being carried out by the system. The argument is non-persuasive. 
	Applicant makes arguments on Pg. 4 of “Remarks” that WO2018 fails to teach the above citation due to the reference not being directed towards the applicant’s claim material. Specifically, the applicant claims that WO2018 only teaches pruning a neural network. The examiner respectfully disagrees.
WO2018 teaches pruning a neural network, as is equivalent to the claim language. Examiner notes that one of ordinary skill in the art would recognize, prior to the effective filing date, that removing/deleting a node from a neural network is synonymous with pruning a node. For further support, the applicant’s specification cites [ (¶0028) “Additionally, it should be noted that removing the given neural node may include defining the weight of the given neural node as zero” ] which is very similar to [ Byun (¶0029) “In these examples, the CNN tuner prunes the weight matrix by zeroing values less than or equal to the pruning threshold” ]. Further, applicant states that WO2018 [ Remarks (Pg. 4) “describes determination of importance value of each neuron in the network layer to be pruned and pruning of neurons according to the importance value of each neuron in the network layer to be pruned” ] which applicant argues is different than the claim limitation regarding marginal contribution value. Examiner respectfully disagrees and points to [ WO2018 (Pg. 10, Paragraph 19) “the importance value of the neuron reflects the influence degree of the neuron on the output of the neural network” ] and [ WO2018 (Pg. 9, Lines 32-33) “The importance value determining unit 81 is configured to determine an importance value of the neuron according to an activation value of the neuron in the network layer to be pruned” ] show that the importance value of WO2018 measure the amount of influence each neuron has on the output via a measurement of the activation function and the connection weight of the neurons which is equivalent to applicant’s marginal contribution value. The argument is not persuasive. 
Additionally, applicant argues on Pg. 6 of “Remarks” that WO2018 fails to teach the claim limitation: “are minimized using the updated weights of remaining neural nodes in the given layer” when referencing false positives. The examiner respectfully disagrees. [ WO2018 (Pg. 8, Lines 24-25) “In order to further improve the accuracy of the neural network, the embodiment of the present invention also adjusts the neurons of the network layer and the next network layer for all network layers” ]. This citation from WO2018 teaches that the adjustments done to the neurons are also to increase the accuracy of the model which would include lessening the rate of false positives in the neural network as a whole. The reference as a whole also teaches the updating of the weights and neural nodes similar to the applicant’s claim language. Applicant is reminded that cited prior art references must be considered in their entirety and not only the cited sections [ MPEP 2141.02(VI) ]. The argument is not persuasive.
Applicant makes further arguments on Pg. 7 of “Remarks” that the combination of references fail to teach “for each layer of the ANN, calculating, by the ANN improvement device, a modified weight matrix from the weight matrix by replacing weight of the given neural node in the given layer with a predefined weight; for each neural node in the given layer, determining, by the ANN improvement device, a marginal contribution value of a given neural node in the given layer with respect to other neural nodes in the given layer on an input vector to the given layer and the modified weight matrix” The examiner respectfully disagrees. [ (¶0064) “processor is configured to prune the compressed layer at least in part by identifying at least one weight value stored in the at least one new matrix that is less than a threshold value, replacing the at least one weight value with 0” ]. This citation teaches the replacement of the weight with a predefined weight (in this case, zero) which is used in the weight matrix. Examiner notes that this is similar to the applicant’s cited specification { (¶0036) “may determine marginal contribution value of each neural node in the weight matrix for each layer of the ANN using the neural node contribution determination module 202. The weight of a given neural node in the weight matrix for a given layer may be changed to a predefined weight (e.g., about zero) so as to generate a modified weight matrix” }. 
Applicant continues that Byun merely generates a new matrix by multiplying, but the citation that the applicant highlighted on the bottom of Pg. 7 of “Remarks” continues by reciting: “…multiplying the truncated Σ matrix by the truncated v* matrix and replacing the weight matrix of the selected layer with the new weight matrix” (emphasis added). So Applicant’s argument that Byun does not teach calculating a modified weight matrix (which applicant correctly stated it does with the multiplied matrix) by replacing the weights of the given neural node in the given layer are not persuasive.  
Applicant states that independent claims 9 and 15 are similar in scope and contain similar arguments for why they should be allowed. Examiner points to the above arguments for claims 9 and 15 respectively. The arguments are not persuasive and the claims remain rejected.
Please see the §103 rejection section below for full claim mapping and analysis. 











Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-6, 9-12, and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Byun (US 20190087729 A1) in view of Varadarajan (US 20190095819 A1) and (WO2018090706A1) hereinafter known as WO2018.
In regards to claim 1, Byun teaches the following:
A method of improving performance of an artificial neural network (ANN), the method comprising: for each layer of the ANN, generating, by an ANN improvement device, a weight matrix comprising a weight of each neural node in a given layer;
[ (¶0028) “the CNN tuner generates a new weight matrix by multiplying the truncated Σ matrix by the truncated v* matrix and replacing the weight matrix of the selected layer with the new weight matrix” 
Byun teaches a CNN tuner, which would be the equivalent to the ANN improvement device. The citation also includes the weight matrix of the selected layer which is equivalent to the neural node weight(s) of that layer. ]
for each layer of the ANN, calculating, by the ANN improvement device, a modified weight matrix from the weight matrix by replacing weight of a given neural node in the given layer with a predefined weight;
[ (¶0064) “processor is configured to prune the compressed layer at least in part by identifying at least one weight value stored in the at least one new matrix that is less than a threshold value, replacing the at least one weight value with 0”
	This citation teaches the replacement of the weight with a predefined weight (in this case, zero) which is used in the weight matrix. Examiner notes that this is similar to the applicant’s cited specification { (¶0036) “may determine marginal contribution value of each neural node in the weight matrix for each layer of the ANN using the neural node contribution determination module 202. The weight of a given neural node in the weight matrix for a given layer may be changed to a predefined weight (e.g., about zero) so as to generate a modified weight matrix” } (emphasis added)  ]
For each neural node in the given layer, determining by the ANN improvement device, a marginal contribution value of the given node in the given layer with respect to the other neural nodes in the given layer based on an input vector to the given layer and the modified weight matrix, 
[ (¶0025) “the diagonal of the Σ matrix lists singular values that indicate the relative importance of eigenvectors stored in the u and v* matrices to the SVD representation of the weight matrix”
	This citation shows the diagonal of the matrix which indicates relative importance of eigenvectors. Examiner notes that this is the equivalent to the marginal contribution score. With the weight matrix at the end of the citation being equivalent to the nodes in the given layer. ]
 [ (¶0028) “the CNN tuner generates a new weight matrix by multiplying the truncated Σ matrix by the truncated v* matrix and replacing the weight matrix of the selected layer with the new weight matrix” 
The new weight matrix from the citation would be the equivalent to the modified weight matrix from the claim. ]
	However what is not distinctly disclosed by Byun is the following and is instead taught by Varadarajan is seen below:
for each remaining neural node in each layer, determining, by the ANN improvement device, a distributed surplus value of a given remaining neural node in a given layer based on the marginal contribution values of a coalition of remaining neural nodes in the given layer and a number of remaining neural nodes in the given layer;
[ (¶0143) “Propagation of error causes adjustments to edge weights, which depends on the gradient of the error at each edge.”
This citation by Varadarajan teaches distributing a correction value through propagation to the other nodes in the layer which will change the activation function/value for each neuron, through the now modified weight matrix. Examiner notes that the error adjustment is equivalent to the distributed surplus value as it also makes changes through the weight matrix to change the activation function. Examiner notes the marginal contribution value was taught in a previous citation. ]
updating, by the ANN improvement device, the weight matrix based on the distributed surplus value of each remaining neural node in each layer.
[ (¶0143) “An edge weight is adjusted according to a percentage of the edge's gradient.” 
This citation teaches updating the weight matrix by adjusting the edge weight. The edge’s gradient is equivalent to the relation of the remaining neural nodes in each layer as the gradient is calculated by the relation of edge error to activation value of an upstream neuron. With the distributed surplus value being equivalent to the adjustment of the edge weight as taught in the above citation. ]
Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a CNN tuning system and method(s) for using it as taught from Byun with the methods of distributing a corrected values from removed nodes to the remaining nodes as taught by Varadarajan. One of ordinary skill in the art, before the effective filing date of the claimed invention, would have found it obvious because the distribution of the error value to other nodes in the layer would decrease the error rate of the neural network [ Varadarajan (¶0143) ]. This has the obvious benefit of a more accurate system.
What Byun and Varadarajan both do not distinctly disclose and is instead taught by WO2018 is the following:
wherein the marginal contribution value of the given neural node in the given layer with respect to other neural nodes in the given layer is determined by a difference between an output loss of the ANN for the input vector based on the weight matrix for the given layer and an output loss of the ANN for the input vector based on the modified weight matrix for the given laver with respect to the given neural node;
[ (Pg. 4, Paragraph 22) “The neural network pruning method provided by the embodiment of the present invention firstly determines, according to the activation value of the neuron, the importance value of each neuron in the network layer to be pruned, and according to the neuron and the next network layer. The connection weights of the middle neurons determine their diversity values. According to the importance values and diversity values of the neurons in the network layer to be pruned”
	This paragraph from WO2018 teaches the neural network comparing the neurons pre-pruning to each other to determine the importance value (equivalent to contribution value of the claim language) for each node in the process of determining which node to prune. ]
[ (Pg. 4, Paragraph 22) “Therefore, the neural network after pruning and the pre-pruning Compared with the neural network, not only the compression and acceleration effects are obtained, but also the precision loss is small compared with that before the pruning”
	Continuing from the previous citation, this citation teaches that the system does compare the neural network’s loss before and after the pruning to find a comparison value of the loss and uses that to make its decision in regards to which node to prune. ]
Identifying, by the ANN improvement device, one or more neural nodes with no or negligible marginal contribution value based on the corresponding marginal contribution value in order to reduce computation;
[ (Pg. 4, Paragraph 22) “Therefore, the neuron pair selected by the largest neuron selection strategy is selected. The output of the neural network has a strong contribution and expression ability. The clipped neurons are neurons that contribute weakly to the neural network output and have poor expression ability. Therefore, the neural network after pruning and the pre-pruning Compared with the neural network, not only the compression and acceleration effects are obtained, but also the precision loss is small compared with that before the pruning”
	This citation and the rest of the teachings from WO2018 explain that a decision is made from the improvement device where the neurons have their loss measured and then are pruned based on which neurons would provide the weakest contribution to the neural network output. This is equivalent to the claim citation with the system picking the nodes that have the least or negligible contribution (importance value of the reference) to be pruned from the network. ]
executing, by the ANN improvement device, an elimination decision for each neural node in each layer by removing the identified one or more neural nodes;
[ (Pg. 9, Line 38) “The pruning unit 84 is configured to cut off other neurons in the network layer to be pruned to obtain a pruning network layer”
	This citation shows the pruning operation being carried out by the system. ]
[ (Pg. 4, Paragraph 22) “the embodiment of the present invention firstly determines, according to the activation value of the neuron, the importance value of each neuron in the network layer to be pruned”
	This citation teaches the neural network going through each neural node and looking at its importance value (equivalent to marginal contribution value) before determining whether it should be pruned or not which would be equivalent to the elimination decision. ]
and minimizing, by the ANN improvement device, false positives in the ANN using the updated weights of remaining neural nodes in the given layer.
[ (Pg. 8, Paragraph 11) “The weight of the connection between the neurons in the pruning network layer and the neurons in the next network layer is adjusted.”
	This citation teaches the weightings of the neural nodes being adjusted after pruning. ]
[ (Pg. 8, Lines 24-25) “In order to further improve the accuracy of the neural network, the embodiment of the present invention also adjusts the neurons of the network layer and the next network layer for all network layers”
	This citation from WO2018 teaches that the adjustments done to the neurons are also to increase the accuracy of the model which would include lessening the rate of false positives in the neural network as a whole. ]
	Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a CNN tuning system and method(s) for using it as taught by Byun/ Varadarajan with the methods for improved neuron pruning as taught by WO2018. One of ordinary skill in the art, before the effective filing date of the claimed invention, would have found it obvious because the enhanced pruning methods would decrease the error rate of the neural network while also enhancing acceleration and compression [ WO2018 (Pg. 4, Paragraph 22) ]. This has the obvious benefit of a more accurate system which would also be faster and utilize less resources.


10.	Regarding claim 2, The method of claim 1, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 1 above. With the rest of the claim being taught by Byun as seen below.
Wherein generating the weight matrix comprises building and training the ANN for a specific application
[ (¶0001) “CNNs are currently used, for example, to accurately detect and classify objects depicted in images and words recited in recordings” 
This citation explicitly teaches using the ANN for a specific application. ]

11.	Regarding claim 3, The method of claim 1, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 1 above. With the rest of the claim being taught by WO2018 as seen below.
wherein the output loss of the ANN based on the weight matrix for the given layer is a difference between the expected output vector of the ANN and an actual output vector of the ANN based on the weight matrix for the given layer
[ (Pg. 4, Paragraph 22) “The neural network pruning method provided by the embodiment of the present invention firstly determines, according to the activation value of the neuron, the
importance value of each neuron in the network layer to be pruned, and according to the neuron and the next network layer. “…“Therefore, the neural network after pruning and the pre-pruning Compared with the neural network, not only the compression and acceleration effects are obtained, but also the precision loss is small compared with that before the pruning.”
This citation by WO2018 teaches the pruning method checking to see the difference in output pre-pruning and post-pruning within the layer as a whole and the node in particular to see if the pruning decision should be made or not. ]
	and wherein the output loss of the ANN based on the modified weight matrix for the given layer with respect to the given neural node is a difference between the expected output vector of the ANN and an actual output vector of the ANN based on the modified weight matrix for the given layer with respect to the given neural node.
[ (Pg. 4, Paragraph 22) “The neural network pruning method provided by the embodiment of the present invention firstly determines, according to the activation value of the neuron, the
importance value of each neuron in the network layer to be pruned, and according to the neuron and the next network layer. “… “Therefore, the neural network after pruning and the pre-pruning Compared with the neural network, not only the compression and acceleration effects are obtained, but also the precision loss is small compared with that before the pruning.”
This citation by WO2018 teaches the pruning method checking to see the difference in output pre-pruning and post-pruning within the layer as a whole and the node in particular to see if the pruning decision should be made or not. ]
Regarding claim 4, The method of claim 1, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 1 above. With the rest of the claim being taught by Byun as seen below.
Wherein the predefined weight in the modified weight matrix is zero
[ (¶0016) “These CNN tuning processes remove unnecessary ranks in CNN tensors (e.g., weight matrices) and also prune remaining near zero weights”  
This shows that the tuning device would be able to convert the matrix to a modified matrix with weight consisting of zero for the affected nodes/neurons. ]

Regarding claim 5, The method of claim 1, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 1 above. With the rest of the claim being taught by Byun as seen below.
Wherein executing the elimination decision comprises removing a given neural node which the corresponding marginal contribution value is less than an adaptive threshold value
[ (¶0029) “For instance, in some examples, the CNN tuner prunes the weight matrix of the selected layer using a predefined and configurable pruning ratio. In some examples, the pruning ratio is expressed as a percentage of the number of weight values” 
This teaches that the tuner will remove (prune) a layer or node as needed if whatever value is selected (which could be contribution value) is lower than the threshold set by the user.]
Regarding claim 6, The method of claim 5, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 5 above. With the rest of the claim being taught by Byun as seen below.
Wherein, the adaptive threshold value is determined by the ANN improvement device based on the marginal contribution value of each neural node in the given layer
[ (¶0029) “For instance, in some examples, the CNN tuner prunes the weight matrix of the selected layer using a predefined and configurable pruning ratio. In some examples, the pruning ratio is expressed as a percentage of the number of weight values” 
This citation teaches that the tuner will decide what the layer or node threshold is as needed if whatever value is selected (which could be the contribution value) is the one set by the user. ]


In regards to claim 9, Byun teaches the following:
A system for improving performance of an artificial neural network (ANN), the system comprising: An ANN improvement device comprising at least one processor and computer readable medium storing instructions
[(¶0010) “The systems and methods disclosed herein tune a CNN to increase both its accuracy and computational efficiency.” ] 
[ (¶0017) “the computing device 100 includes a processor 102, memory” ]	That, when executed by the at least one processor, cause the at least one processor to perform operations comprising:
[ (¶0017) “and register memory, that can execute instructions defined by an instruction set.” ]
	Examiner notes that the rest of the claim is similar to independent claim 1 aside from claim 1 being directed to a method rather than a system. As the claim limitations are similar aside from the above limitations, the claim is rejected with the same mapping, art and motivations as claim 1. Please see the rejection for claim 1 above for the detailed mapping and motivation.




Regarding claim 10, The system of claim 9, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 9 above. With the rest of the claim being taught by WO2018 as seen below.
wherein the output loss of the ANN based on the weight matrix for the given layer is a difference between the expected output vector of the ANN and an actual output vector of the ANN based on the weight matrix for the given layer
[ (Pg. 4, Paragraph 22) “The neural network pruning method provided by the embodiment of the present invention firstly determines, according to the activation value of the neuron, the
importance value of each neuron in the network layer to be pruned, and according to the neuron and the next network layer. “…“Therefore, the neural network after pruning and the pre-pruning Compared with the neural network, not only the compression and acceleration effects are obtained, but also the precision loss is small compared with that before the pruning.”
This citation by WO2018 teaches the pruning method checking to see the difference in output pre-pruning and post-pruning within the layer as a whole and the node in particular to see if the pruning decision should be made or not. ]
	and wherein the output loss of the ANN based on the modified weight matrix for the given layer with respect to the given neural node is a difference between the expected output vector of the ANN and an actual output vector of the ANN based on the modified weight matrix for the given layer with respect to the given neural node.
[ (Pg. 4, Paragraph 22) “The neural network pruning method provided by the embodiment of the present invention firstly determines, according to the activation value of the neuron, the
importance value of each neuron in the network layer to be pruned, and according to the neuron and the next network layer. “… “Therefore, the neural network after pruning and the pre-pruning Compared with the neural network, not only the compression and acceleration effects are obtained, but also the precision loss is small compared with that before the pruning.”
This citation by WO2018 teaches the pruning method checking to see the difference in output pre-pruning and post-pruning within the layer as a whole and the node in particular to see if the pruning decision should be made or not. ]
Regarding claim 11, The system of claim 9, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 9 above. With the rest of the claim being taught by Byun as seen below.
Wherein executing the elimination decision comprises removing a given neural node which the corresponding marginal contribution value is less than an adaptive threshold value
[ (¶0029) “For instance, in some examples, the CNN tuner prunes the weight matrix of the selected layer using a predefined and configurable pruning ratio. In some examples, the pruning ratio is expressed as a percentage of the number of weight values” 
This teaches that the tuner will remove (prune) a layer or node as needed if whatever value is selected (which could be the contribution value) is lower than the threshold set by the user.]
Regarding claim 12, The system of claim 11, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 11 above. With the rest of the claim being taught by Byun as seen below.
Wherein, the adaptive threshold value is determined by the ANN improvement device based on the marginal contribution value of each neural node in the given layer, and wherein removing the given neural node comprises defining the weight of the given neural node as zero.
[ (¶0029) “For instance, in some examples, the CNN tuner prunes the weight matrix of the selected layer using a predefined and configurable pruning ratio. In some examples, the pruning ratio is expressed as a percentage of the number of weight values” 
This citation teaches that the tuner will decide what the layer or node threshold is as needed if whatever value is selected (which could be the contribution value) is the one set by the user.]
In regards to claim 15, Byun teaches the following:
A non-transitory computer-readable medium storing computer-executable instructions for improving performance of an artificial neural network (ANN), the computer-executable instructions configured for: 
 [ (¶0075) “non-transient computer readable medium encoded with instructions that when executed by at least one processor cause a process for tuning a convolutional neural network” ]
Examiner notes that the rest of the claim is similar to independent claim 1 aside from claim 1 being directed to a method rather than a computer-readable medium storing instructions. As the claim limitations are similar aside from the above limitations, the claim is rejected with the same mapping, art and motivations as claim 1. Please see the rejection for claim 1 above for the detailed mapping and motivation.


Regarding claim 16, The non-transitory computer-readable medium of claim 15, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 15 above. With the rest of the claim being taught by WO2018 as seen below.
wherein the output loss of the ANN based on the weight matrix for the given layer is a difference between the expected output vector of the ANN and an actual output vector of the ANN based on the weight matrix for the given layer
[ (Pg. 4, Paragraph 22) “The neural network pruning method provided by the embodiment of the present invention firstly determines, according to the activation value of the neuron, the
importance value of each neuron in the network layer to be pruned, and according to the neuron and the next network layer. “…“Therefore, the neural network after pruning and the pre-pruning Compared with the neural network, not only the compression and acceleration effects are obtained, but also the precision loss is small compared with that before the pruning.”
This citation by WO2018 teaches the pruning method checking to see the difference in output pre-pruning and post-pruning within the layer as a whole and the node in particular to see if the pruning decision should be made or not. ]
	and wherein the output loss of the ANN based on the modified weight matrix for the given layer with respect to the given neural node is a difference between the expected output vector of the ANN and an actual output vector of the ANN based on the modified weight matrix for the given layer with respect to the given neural node.
[ (Pg. 4, Paragraph 22) “The neural network pruning method provided by the embodiment of the present invention firstly determines, according to the activation value of the neuron, the
importance value of each neuron in the network layer to be pruned, and according to the neuron and the next network layer. “… “Therefore, the neural network after pruning and the pre-pruning Compared with the neural network, not only the compression and acceleration effects are obtained, but also the precision loss is small compared with that before the pruning.”
This citation by WO2018 teaches the pruning method checking to see the difference in output pre-pruning and post-pruning within the layer as a whole and the node in particular to see if the pruning decision should be made or not. ]

Regarding claim 17, The non-transitory computer-readable medium of claim 15, is taught by Byun/Varadarajan as in the rejection for claim 15 above. With the rest of the claim being taught by Byun as seen below.
Wherein executing the elimination decision comprises removing a given neural node which the corresponding marginal contribution value is less than an adaptive threshold value
[ (¶0029) “For instance, in some examples, the CNN tuner prunes the weight matrix of the selected layer using a predefined and configurable pruning ratio. In some examples, the pruning ratio is expressed as a percentage of the number of weight values” 
This teaches that the tuner will remove (prune) a layer or node as needed if whatever value is selected (which could be contribution value) is lower than the threshold set by the user.]
Regarding claim 18, The non-transitory computer-readable medium of claim 17, is taught by Byun/Varadarajan as in the rejection for claim 17 above. With the rest of the claim being taught by Byun as seen below.
Wherein, the adaptive threshold value is determined by the ANN improvement device based on the marginal contribution value of each neural node in the given layer, and wherein removing the given neural node comprises defining the weight of the given neural node as zero.
[ (¶0029) “For instance, in some examples, the CNN tuner prunes the weight matrix of the selected layer using a predefined and configurable pruning ratio. In some examples, the pruning ratio is expressed as a percentage of the number of weight values” 
This citation teaches that the tuner will decide what the layer or node threshold is as needed if whatever value is selected (which could be the contribution value) is the one set by the user. ]

Claims 7, 8, 13, 14, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Byun/Varadarajan/WO2018 in view of Nachum (US 20190147339 A1).
In regards to claim 7, The method of claim 1, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 1 above. With the rest of the claim being taught by Nachum (US 20190147339 A1) as seen below.
Wherein the distributed surplus value of the given remaining neural node comprises an average marginal contribution value of the coalition of remaining neural nodes in the given layer
[ (¶0006) “In some implementations, the terms of the shrinking engine loss function that penalize active neurons of the neural network comprise: a group lasso regularization term, wherein each group comprises the input weights of a neuron of the neural network.” 
This citation teaches a group lasso regularization term which is an average of the input weights for the group of neurons which can change the weightings of a neuron based on its discrepancy of its weightings in comparison to the group’s weighting. ]
	Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a CNN tuning system and method(s) for using it as taught by Byun/Varadarajan/WO2018 and a marginal contribution value consisting of a coalition of remaining neural nodes in the given layer as taught by Nachum. One of ordinary skill in the art, before the effective filing date of the claimed invention, would have found it obvious because it would provide a superior prediction accuracy [ Nachum (¶0027) ]. Which has the obvious benefit of a more accurate model overall.


In regards to claim 8, The method of claim 1, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 1 above. With the rest of the claim being taught by Nachum as seen below.
Wherein updating the weight matrix comprises replacing the original weight of each remaining neural node in each layer with a corresponding distributed surplus value
[ (¶0009) “In some implementations, each of the terms of the shrinking engine loss function that penalize active neurons of the neural network correspond to a different neuron of the neural network; and each of the terms of the shrinking engine loss function that penalize active neurons is weighted by a different factor that depends on a number of operations induced by a neuron corresponding to the term.”
This citation from Nachum teaches updating and/or replacing a weight matrix in the neural node with values that could correspond from the other neural node(s). The citation talks about active neurons that are not pruned and/or removed and correlates them to other neurons in the network and explicitly teaches the different weighted factors which is an equivalent to distributed surplus value. ]

In regards to claim 13, The system of claim 9, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 9 above. With the rest of the claim being taught by Nachum as seen below.
Wherein the distributed surplus value of the given remaining neural node comprises an average marginal contribution value of the coalition of remaining neural nodes in the given layer
[ (¶0006) “In some implementations, the terms of the shrinking engine loss function that penalize active neurons of the neural network comprise: a group lasso regularization term, wherein each group comprises the input weights of a neuron of the neural network.” 
This citation teaches a group lasso regularization term which is an average of the input weights for the group of neurons which can change the weightings of a neuron based on its discrepancy of its weightings in comparison to the group’s weighting.]
	With respect to Claim 13, it is substantially similar to Claim 7 and is rejected in the same manner, the same art and reasoning applying. Please refer to claim 7 to see the motivation to combine. 

In regards to claim 14, The system of claim 9, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 9 above. With the rest of the claim being taught by Nachum as seen below.
Wherein updating the weight matrix comprises replacing the original weight of each remaining neural node in each layer with a corresponding distributed surplus value
[ (¶0009) “In some implementations, each of the terms of the shrinking engine loss function that penalize active neurons of the neural network correspond to a different neuron of the neural network; and each of the terms of the shrinking engine loss function that penalize active neurons is weighted by a different factor that depends on a number of operations induced by a neuron corresponding to the term.”
This citation from Nachum teaches updating and/or replacing a weight matrix in the neural node with values that could correspond from the other neural node. The citation talks about active neurons that are not pruned and/or removed and correlates them to other neurons in the network and explicitly teaches the different weighted factors which is an equivalent to distributed surplus value. ]



In regards to claim 19, The non-transitory computer-readable medium of claim 15, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 9 above. With the rest of the claim being taught by Nachum as seen below.
Wherein updating the weight matrix comprises replacing the original weight of each remaining neural node in each layer with a corresponding distributed surplus value
[ (¶0009) “In some implementations, each of the terms of the shrinking engine loss function that penalize active neurons of the neural network correspond to a different neuron of the neural network; and each of the terms of the shrinking engine loss function that penalize active neurons is weighted by a different factor that depends on a number of operations induced by a neuron corresponding to the term.”
This citation from Nachum teaches updating and/or replacing a weight matrix in the neural node with values that could correspond from the other neural node. The citation talks about active neurons that are not pruned and/or removed and correlates them to other neurons in the network and explicitly teaches the different weighted factors which is an equivalent to distributed surplus value. ]


In regards to claim 20, The non-transitory computer-readable medium of claim 15, is taught by Byun/Varadarajan/WO2018 as in the rejection for claim 15 above. With the rest of the claim being taught by Nachum as seen below.
Wherein the distributed surplus value of the given remaining neural node comprises an average marginal contribution value of the coalition of remaining neural nodes in the given layer
[ (¶0006) “In some implementations, the terms of the shrinking engine loss function that penalize active neurons of the neural network comprise: a group lasso regularization term, wherein each group comprises the input weights of a neuron of the neural network.” 
This citation teaches a group lasso regularization term which is an average of the input weights for the group of neurons which can change the weightings of a neuron based on its discrepancy of its weightings in comparison to the group’s weighting.]
With respect to Claim 20, it is substantially similar to Claim 7 and is rejected in the same manner, the same art and reasoning applying. Please refer to claim 7 to see the motivation to combine.
 



Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US 20210027166 A1 – dynamic pruning of neurons on-the-fly to accelerate neural network inferences which teaches an importance score, pruning, and neural network channels 
US 20190197406 A1 – Neural entropy enhanced machine learning which teaches neuron importance, pruning, neuron matrices, and re-training after pruning
US 11093832 B2 – Pruning redundant neurons and kernels of deep convolutional neural networks which teaches pruning, importance values and neuron matrices.



Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL MERABI whose telephone number is (571)272-9685. The examiner can normally be reached Mon-Fri 7:30am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on (571) 270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/M.A.M./Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123