DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendments made to claims 33, 44-57 have not overcome the rejections under 35 U.S.C. 101 or 103. See updated rejections below.
Response to Arguments
Regarding applicant’s arguments on page 13 regarding the 35 USC § 112 Rejection:
Applicant’s arguments on page 1 filed 6/17/2022 have been fully considered and are not persuasive. The arguments made are inconsistent with the claims. It is unclear if “the training data set” in line 3 of claim 38 is referring to the newly introduced “a first training data set” in line 2 of claim 38, or if it is referring to “a training data set” introduced in claim 37. If the applicant intends the claims to reflect what was described in the arguments, please amend to make the claims consistent with the arguments made. See updated rejection below.

Regarding applicant’s arguments on page 13 regarding the 35 USC § 101 Rejection:
Applicant’s arguments on page 1 filed 6/17/2022 have been fully considered and are persuasive.  The rejections of claims 51-57 have been withdrawn. 

Regarding applicant’s statement on pages 13-14 regarding the 35 USC § 103 Rejection on Splicing connections:
	Independent claim 33 is amended to recite "said splicing comprises reconnecting a second connection of the available connections, the second connection connecting a first node of the first layer and a second node of the second layer, and wherein the second connection between the first node and the second node was disconnected in a previous iteration". Independent claims 44 and 51 are amended in the same or similar manner. These amendments clarify how the claimed splicing reconnects a connection that was previously disconnected. Notably, the first node (Nl) of a first layer was previously connected to a second node (N2) of a second layer. This connection (Nl-N2) was disconnected in a previous iteration and is being reconnected in the current iteration. Han, Shamir, and StackOverflow, alone or in combination, fail to disclose or render obvious claim 33 as amended.
Han discusses network pruning including learning connectivity, pruning small-weight connections, and retraining. (Han,§ 2 Network Pruning). However, Han does not discuss splicing to reconnect a connection (Nl-N2) that was previously disconnected.
Shamir discusses a variety of pruning strategies including removing neurons based on them being negligible or constant, their functionality, their redundancy, or others. (Shamir, § 1. Introduction). Notably, Shamir focuses on locating pairs of neurons that have similar functionality and merging them. (Id., last CJ{ of§ 1). Once a neuron is removed, the output of the removed neuron is reconnected to the output of the surviving neuron. (Id.). However, this fails to disclose or render obvious a reconnection of a connection (Nl-N2) that was previously disconnected. For example, in the context of Shamir, assuming a node or neuron N2 that was coupled to the output of a removed neuron Nl (a prior Nl-N2 connection), the neuron N2 would then be coupled to an output of a surviving neuron N3 such that a new N3-N2 connection is formed. However, the prior Nl-N2 connection is not reconnected, as provided in the claims. This type of reconnection is not contemplated by either Han or Shamir. Furthermore, Shamir does not discuss or render obvious "updating weights corresponding to both the currently disconnected and the currently connected connections" as claimed. Instead, Shamir discusses "proper normalization". (Id.).
Examiner’s response:
Applicant’s arguments have been fully considered but are moot in light of the new grounds of rejection necessitated by the amendments. See updated rejection below.

Regarding applicant’s arguments on page 14 regarding the 35 USC § 103 Rejection on Updating weights:
Furthermore, Han does not discuss "updating weights corresponding to both the currently disconnected and the currently connected connections" as claimed. Instead, Han discusses "retrain[ing] the network to learn final weights". (Id.). Notably, there is no reason for Han to update weights for currently disconnected connections, as claimed, as no reconnection as claimed is contemplated.
Examiner’s response:

Applicant's arguments have been fully considered but they are not persuasive. In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
Han teaches retraining the network to learn the final weights for remaining existing connections but does not explicitly teach reconnecting previously disconnected connections between nodes.  However, when combined with Shamir, which teaches reconnecting disconnected connections, weights of the retraining would include all existing weights including reconnected weights. The combination of Han and Shamir would thus include learning weights on the reconnected weights.

Regarding applicant’s arguments on page 15 regarding the remaining 35 USC § 103 Rejections for claims 34-36, 45, 46, 52 and 53:
As discussed, amended independent claims 33, 44, and 51 are patentable over Han, Shamir, and StackOverflow. Koo, Shafiee, Zhao, Ardakani, StackExchange, and Krizhevsky fail to cure the deficiencies of Han, Shamir, and StackOverflow as none of Koo, Shafiee, Zhao, Ardakani, StackExchange, and Krizhevsky disclose or render obvious reconnecting previously disconnected connections nor updating weights for currently disconnected available connections, as claimed. Therefore, claims 33, 44, and 51 are patentable over Han, Shamir, StackOverflow, Koo, Shafiee, Zhao, Ardakani, StackExchange, and Krizhevsky. Claims 34-37, 39-41, 43, 45- 49, and 52-56 are patentable over Han, Shamir, StackOverflow, Koo, Shafiee, Zhao, Ardakani, StackExchange, and Krizhevsky based at least on their dependencies.
Examiner’s response:
Examiner notes that the applicant’s arguments are moot in light of the new grounds of rejection for claims 33, 44, and 51. See updated rejection below.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claim 38 is rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 38 recites the limitation "the training set is a randomly selected subset of the first training data set" in lines 3-4.  There is insufficient antecedent basis for this limitation in the claim. It is unclear if “the training data set” is referring to “a first training data set” in line 2 or “a training data set” previously introduced in claim 37.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 33, 38, 41, 42, 44, 50, 51, 57 are rejected under 35 U.S.C. 103 as being unpatentable over Han, Song, Huizi Mao, and William J. Dally. "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding." (2015) [hereinafter Han] in view of Shamir, N; Saad, D; Marom, E. “Neural Net Pruning Based on Functional Behavior of Neurons” (1993) [hereinafter Shamir], further in view of “Why Pretraining for Convolutional Neural Networks” (2014) [hereinafter StackOverflow].
Regarding claim 33, Han teaches a computer-implemented method for compressing a pre-trained deep neural network, comprising (Han; §1 Introduction, page 2, paragraph 4 starting with “Our goal is to…”; Deep compression to reduce storage required by neural networks in a manner that preserves the original accuracy that can be deployed on mobile devices):
receiving, for a deep neural network model having an input layer, one or more hidden layers, an output layer, and available connections between the layers, reference weights corresponding to the available connections (Han; §3 Trained Quantization and Weight Sharing, page 3, paragraphs 4-5 starting with “Network quantization”; 4 input neurons and 4 output neurons); 
Examiner notes that since the applicant’s amendment has broadened the scope of the limitations, the originally mapping still applies. 
generating a sparsely connected deep neural network model based on the deep neural network model by iteratively (Han; §2 Network Pruning, page 3, paragraphs 2-3, starting with “We store…”; Sparse structure that results from pruning stored using compressed sparse row (CSR) or compressed sparse column (CSC) format): 
Examiner notes that the sparse structure is generated from pruning using CSR or CSC format. 
pruning… one or more of the available connections between first and second adjacent layers of the deep neural network model, wherein said pruning comprises disconnecting a first connection of the available connections between the first and second layers (Han; §2 Network Pruning, page 2, paragraph 6 starting with “Network pruning…”; prune the small-weight connections)… and updating weights corresponding to both the currently disconnected and the currently connected connections of the available connections between the two adjacent layers (Han; §2 Network Pruning, page 2, paragraph 6 starting with “Network pruning…”; network is retrained to learn the new weights for the remaining sparse connections… results from pruning are stored); 
Examiner notes that the “new weights” that Han states maps to the updated weights. Because once the connections of the original network has been altered, the “updated weights” are different, and thus “new.”
and storing the sparsely connected deep neural network model, wherein the sparsely connected deep neural network model comprises final iteration weights for only connected connections of the available connections between the two adjacent layers at a final iteration  (Han; §2 Network Pruning, page 2, paragraph 6 starting with “Network pruning”; Connections with weights below a threshold are removed from the network, the resulting sparse structure being stored in a CSR or CSC format).
	Han does not explicitly teach splicing… and said splicing comprises reconnecting at least a second connection of the available connections between the two adjacent layers, wherein the second connection was disconnected in a previous iteration.
Shamir teaches:
Splicing… and said splicing comprises reconnecting a second connection of the available connections, the second connection connecting a first node of the first layer and a second node of the second layer, and wherein the second connection between the first node and the second node was disconnected in a previous iteration (Shamir; §1 Introduction, pages 144-145 paragraph 3, right column, starting with “In the present work…”; Interconnections are reconnected to the output of surviving neuron).
	Examiner notes that under the broadest reasonable interpretation, “splicing” is simply reconnecting a previously “pruned” connection. Shamir teaches that interconnections, once connected to the output of the removed neuron, are reconnected to the output of the surviving neuron. It would be obvious for a person of ordinary skill in the art before the effective filing date to use the splicing or reconnecting that Shamir teaches and apply it to Han’s teaching of pruning because pruning, or removing neurons from a net alone usually damages the performance of the remaining neurons (Shamir; §1 Introduction, page 143, right column paragraphs 2-3, starting with “Removing neurons from the net…” and pages 144-145, right column paragraph 3 starting with “In present work…”). Splicing neurons, the way Shamir teaches it, even if it has high relevance will cause minimal damage since the surviving neuron will substitute, and then the previous connections of the removed neurons are reconnected to the output of the surviving neuron (Shamir; §1 Introduction, pages 144-145, right column paragraph 3 starting with “In present work…”). 
Regarding claim 38, Han does not explicitly teach the use of pre-trained nets. 
Shamir teaches pre-training the deep neural network model based on a first training data set to determine the reference weights, wherein the training data set is a randomly selected subset of the first training data set (Shamir; Fig. 1, 2, Tables 1, 3, 5; Pretrained nets as input).
	Examiner notes that Shamir teaches the use of pretrained nets used as input. While Han does not explicitly teach this, it would have been obvious for a person of ordinary skill in the art before the effective filing date to have used pretrained nets as taught by Shamir and use them as input for a neural network. Further, the benefits of pretrained nets are well known, and in fact discussed in online forums as well. One StackOverflow explanation teaches that pretraining is a regularization technique and improves generalization accuracy of the model (StackOverflow; Reasons for Pretraining Approaches). 
Regarding claim 41, Han teaches wherein the deep neural network comprises a deep fully connected neural network, a deep convolutional neural network, or a deep recurrent neural network (Han; §5 Experiments, §5.1 LeNet-300-100 and LeNet-5 on MNIST; Tables 1, 2, 3; Deep Compression used on LeNet-5).
Examiner notes that Han teaches the use of the deep compression and pruning technique on LeNet-5, which is a convolutional neural network.
Regarding claim 42, Han teaches wherein the deep neural network comprises a deep convolutional neural network and the weights comprise coefficients of each kernel of the convolutional layer of the two adjacent layers (Han; §5 Experiments, §5.1 LeNet-300-100 and LeNet-5 on MNIST; Tables 1, 2, 3; Deep Compression used on LeNet-5).
Examiner notes that Han teaches that LeNet-5, which was pruned as described above, is a convolutional network that has two convolutional layers and two fully connected layers.  
Regarding claim 44, Han in view of Shamir, hereinafter [Han-Shamir] teaches all the limitations and motivations of claim 33 in apparatus and/or product form rather than method form. Therefore, the supporting rationale of the rejection to claim 33 applies equally as well to those elements of claim 44. The claim additionally recites “a system…  a processor or processor circuitry coupled to the memory, the process or processor circuitry to…” Han additionally teaches that the method is tested on the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor… and also benchmarked on GPUs using cuBLAS GEMV for the original dense layer, and used cuSPARSE CSRMV kernel for pruned layers (Han; §6.3 Speedup and Energy Efficiency).
Regarding claim 50, Han teaches all the limitations and motivations of claim 42 in apparatus and/or product form rather than method form. Therefore, the supporting rationale of the rejection to claim 42 applies equally as well to those elements of claim 50. Claim 50 additionally recites “the system…” Han additionally teaches that the method is tested on the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor… and also benchmarked on GPUs using cuBLAS GEMV for the original dense layer, and used cuSPARSE CSRMV kernel for pruned layers (Han; §6.3 Speedup and Energy Efficiency).
Regarding claim 51, Han-Shamir teaches all the limitations and motivations of claim 33 in apparatus and/or product form rather than method form. Therefore, the supporting rationale of the rejection to claim 33 applies equally as well to those elements of claim 51. The claim additional recites “at least one non-transitory machine readable medium comprising a plurality of instructions that, in response to being executed on a computing device, cause the computing devices to compress a deep neural network by…” Han additionally teaches that the method is tested on the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor… and also benchmarked on GPUs using cuBLAS GEMV for the original dense layer, and used cuSPARSE CSRMV kernel for pruned layers (Han; §6.3 Speedup and Energy Efficiency).
Regarding claim 57, Han teaches all the limitations and motivations of claim 42 in apparatus and/or product form rather than method form. Therefore, the supporting rationale of the rejection to claim 42 applies equally as well to those elements of claim 57. Claim 57 additionally recites “the non-transitory machine readable medium…” Han additionally teaches that the method is tested on the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor… and also benchmarked on GPUs using cuBLAS GEMV for the original dense layer, and used cuSPARSE CSRMV kernel for pruned layers (Han; §6.3 Speedup and Energy Efficiency).


Claims 34, 35, 36, 45, 46, 52, 53 are rejected under 35 U.S.C. 103 as being unpatentable over Han-Shamir in view of StackOverflow [hereinafter Han-Shamir-StackOverflow] further in view of Koo, Terry; Globerson, Amir; Carreras, Xavier; Collins, Michael. Structured Prediction Models via the Matrix-Tree Theorem [hereinafter Koo] further in view of Shafiee, Mohommad Javad; Siva, Parthipan; Wong, Alexander. StochasticNet: Forming Deep Neural Networks via Stochastic Connectivity (2015) [hereinafter Shafiee].
	Regarding claim 34, Koo teaches iteratively pruning and splicing the available connections and updating the weights comprises:
generating a current iteration connection matrix comprising a plurality of indicators each indicating whether a corresponding available connection between the two layers is connected or not connected, wherein generating the current iteration connection matrix comprises applying (Weighted adjacency matrix from weights of edges in the graph; Koo; §3 Spanning-tree inference using the Matrix-Tree Theorem), to each of a plurality of previous iteration connection weights each corresponding to one of the plurality of available connections (Algorithm teaches repeating N times, with N being the number of values in the datastructure; Koo; Figure 2, EG Algorithm)
	Examiner notes that Koo is teaching the use of an weighted adjacency matrix from the weights of the edges of a graph. It would be obvious for a person of ordinary skill in the art before the effective filing date to use this method of adjacency matrices and use it to connections of a neural network, and use it to keep track of whether connections exist. If there are connections, then there will be a weight. If no connection exists, then the weight would be 0. Koo teaches that their adaptation for partitioning function and marginals has a runtime of O(n3), an improvement of previous less advanced implementations which yield runtimes of O(n4) and O(n6) (Koo, §1 Introduction, paragraph 2). Additionally, according to Shafiee, deep neural networks can be fundamentally expressed and represented as graphs (Shafiee; §2 Review of Random Graph Theory).
Regarding claim 35, Han teaches applying the discriminative function comprises:
Comparing an individual previous iteration connection weight to a threshold and providing a disconnect indicator when the individual previous iteration connection weight compares unfavorably to the threshold (Connections with weights below a threshold are removed from the network; Han; §2 Network Pruning).
	Examiner notes that under the broadest reasonable interpretation, a connection weight comparing unfavorably to the threshold can simply mean that the weights are either below or above a certain threshold. Han teaches that connections with weights below a threshold are pruned. 
Regarding claim 36, Han and Shamir teach applying the discriminative function comprises comparing an individual previous iteration connection weight to a first threshold and a second threshold greater than first threshold (pruning connections with weights below a threshold; Han; §2 Network Pruning |low thresholds can cause overpruning with the removal of irreplaceable neurons and high thresholds would cause underpruning; Shamir; §4 Experimental Results, paragraphs 2-3) and providing a disconnect indicator when the individual previous iteration connection weight compares unfavorably to the first threshold, a connect indicator when the individual previous iteration connection weight compares favorably to the second threshold, and, otherwise, a no change indicator.
	Examiner notes that Han teaches pruning all connections with weights below a threshold (Han; §2 Network Pruning). While this does not explicitly teach using two thresholds, it would be obvious to try and use two thresholds given that Shamir teaches that a low threshold can cause overpruning, taking out irreplaceable neurons and causing unrecoverable damage to the net performance while a high threshold may result in insufficient pruning performance (Shamir; §4 Experimental Results, paragraphs 2-3). Shamir further teaches though its findings and results that for their network, a threshold of .39 (low threshold) or less results with the removal of all hidden neurons causing fatal damage to the net performance while a threshold of .75 (high threshold) or more results in ineffective pruning and many redundant, intact neurons (Shamir; §4 Experimental Results, paragraphs 10-12, page 155 of attached reference). Shamir even further gives a range between which threshold values result in an efficient pruning, which means any threshold below the highest, and above the lowest would allow for efficient pruning. Based on Shamir’s teaching of the consequences of too low or too high thresholds and the explicitly teaching of threshold ranges for their specific network, it would have been obvious for a person with ordinary skill in the art before the effective filing date to use two thresholds to prevent both over- and under-pruning and their respective consequences. 
Regarding claim 45, Koo in view of Shafiee, hereinafter [Koo-Shafiee] teaches all the limitations and motivations of claim 34 in apparatus and/or product form rather than method form. Therefore, the supporting rationale of the rejection to claim 34 applies equally as well to those elements of claim 45. Claim 45 additionally recites “the system… comprises the processor or processor circuitry.” Han additionally teaches that the method is tested on the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor… and also benchmarked on GPUs using cuBLAS GEMV for the original dense layer, and used cuSPARSE CSRMV kernel for pruned layers (Han; §6.3 Speedup and Energy Efficiency).
Regarding claim 46, Han-Shamir teaches all the limitations and motivations of claim 36 in apparatus and/or product form rather than method form. Therefore, the supporting rationale of the rejection to claim 36 applies equally as well to those elements of claim 46. Claim 46 additionally recites “the system.” Han additionally teaches that the method is tested on the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor… and also benchmarked on GPUs using cuBLAS GEMV for the original dense layer, and used cuSPARSE CSRMV kernel for pruned layers (Han; §6.3 Speedup and Energy Efficiency).
Regarding claim 52, Koo-Shafiee teaches all the limitations and motivations of claim 34 in apparatus and/or product form rather than method form. Therefore, the supporting rationale of the rejection to claim 34 applies equally as well to those elements of claim 52. Claim 52 additionally recites “the non-transitory machine readable medium.” Han additionally teaches that the method is tested on the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor… and also benchmarked on GPUs using cuBLAS GEMV for the original dense layer, and used cuSPARSE CSRMV kernel for pruned layers (Han; §6.3 Speedup and Energy Efficiency).
Regarding claim 53, Han-Shamir teaches all the limitations and motivations of claim 36 in apparatus and/or product form rather than method form. Therefore, the supporting rationale of the rejection to claim 36 applies equally as well to those elements of claim 53. Claim 53 additionally recites “the non-transitory machine readable medium.” Han additionally teaches that the method is tested on the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor… and also benchmarked on GPUs using cuBLAS GEMV for the original dense layer, and used cuSPARSE CSRMV kernel for pruned layers (Han; §6.3 Speedup and Energy Efficiency).

Claims 37, 47, 54 are rejected under 35 U.S.C. 103 as being unpatentable over Han-Shamir-StackOverflow further in view of Zhao, Hang; Gallo, Orazio; Frosio, Iuri; Kautz, Jan. “Is L2 a Good Loss Function for Neural Networks for Image Processing” (2015) [hereinafter Zhao].
Regarding claim 37, Han teaches iteratively pruning… the available connections and updating the weights comprises Iteratively pruning… available connections and updating weights between all adjacent layers of the deep neural network model by (Shared weight updated and each shared weight is updated with all gradients that fall into that bucket; Han; §3.3 Feed-Forward and Back-Propagation, §5 Experiments):
and for each hidden layer and the output layer of the current iteration deep neural network model generating a current iteration connection matrix based on a previous matrix of connection weights; and updating the previous matrix of connection weights to a current matrix of connection weights based on the previous matrix of connection weights and the loss function gradient (Shared weight updated and each shared weight is updated with all gradients that fall into that bucket; Han; §3.3 Feed-Forward and Back-Propagation, §5 Experiments). 
Han does not explicitly teach Applying a current iteration deep neural network model to a training set; determining a network loss based on the application of the current iteration deep neural network model to the training data set; generating a loss function gradient based on the current iteration deep neural network mode.
	Zhao teaches: 
Applying a current iteration deep neural network model to a training set (network trained considering different cost functions on a training set of 700 RGB images; Zhao; §4 Results, paragraphs 1-3); determining a network loss based on the application of the current iteration deep neural network model to the training data set (different functions for calculating loss; Zhao; §3.2 SSIM, §3.3 MS-SSIM); generating a loss function gradient based on the current iteration deep neural network mode (Comparisons of results of networks trained on different loss functions; Zhao; Figure 1).
Examiner notes that it would have been obvious to a person of ordinary skill in the art before the effective filing date to use loss functions and consider network loss. In fact, Zhao teaches and suggests that loss functions are so common and well known in the pertinent art that Zhao compares different types of loss functions and even proposes a new loss function with better performance compared to other known loss functions. Zhao teaches that the loss layer is an effective driver of the network’s learning to produce the desired output quality (Zhao; §1 Introduction, paragraph 3, §2.1 Neural Networks for Image Processing).
Regarding claim 47, Han in view of Zhao, hereinafter [Han-Zhao] teaches all the limitations and motivations of claim 37 in apparatus and/or product form rather than method form. Therefore, the supporting rationale of the rejection to claim 37 applies equally as well to those elements of claim 47. Claim 47 additionally recites “The system… comprises the processor or processor circuitry.” Han additionally teaches that the method is tested on the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor… and also benchmarked on GPUs using cuBLAS GEMV for the original dense layer, and used cuSPARSE CSRMV kernel for pruned layers (Han; §6.3 Speedup and Energy Efficiency).
Regarding claim 54, Han in view of Zhao, hereinafter [Han-Zhao] teaches all the limitations and motivations of claim 37 in apparatus and/or product form rather than method form. Therefore, the supporting rationale of the rejection to claim 37 applies equally as well to those elements of claim 54. Claim 54 additionally recites “the non-transitory machine readable medium.” Han additionally teaches that the method is tested on the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor… and also benchmarked on GPUs using cuBLAS GEMV for the original dense layer, and used cuSPARSE CSRMV kernel for pruned layers (Han; §6.3 Speedup and Energy Efficiency).

	Claims 39, 40, 48, 49, 55, 56 are rejected under 35 U.S.C 103 as being unpatentable over Han in view of Ardakani, Arash; Condo, Carlo; Gross, Warren. Sparsely-Connected Neural Networks: Towards Efficient VLSI Implementation of Deep Neural Networks (v1 2016) [hereinafter Ardkani] in view of StackExchange “Why are activation functions needed in neural networks” (2015).
	Regarding claim 39, Ardakani teaches iteratively pruning and splicing available connections and updating the weights comprises:
stochastically determining, for a current iteration, a pruning and splicing activation indicator indicating whether pruning and splicing are to be applied for the current iteration (Random Binary Stream for activation; Ardakani; §3 Sparsely-Connected Neural Networks); and PCT Application. No.5Docket No: O1.P106057PCT-USPCT/CN16/101043only pruning and splicing available connections when the pruning and splicing activation indicator indicates pruning and splicing are to be applied for the current iteration (act() is an activation function that is used; §3 Sparsely-Connected Neural Networks and Equation (1)).
	Examiner notes that Ardakani teaches using an approach that uses a random binary stream with expected values that is used to determine which connections are dropped. It would have been obvious for a person of ordinary skill in the art before the effective filing date to utilize an activation function such as the one taught by Ardakani and use it to determine pruning and splicing. It is also well known that activation functions introduce non-linearity into the output of a neuron to help the network learn complex patterns (StackExchange).
	Regarding claim 40, Ardakani teaches stochastically determining the pruning and splicing indicator comprises
applying a probability function based on the iteration number of the current iteration, wherein the probability function is a monotonically non-increasing probability function (Randomly dropping connections does not compromise performance; Ardakani; §3 Sparsely-Connected Neural Networks, §4.1 Experimental Results on MNST).
Examiner notes that Ardakani teaches that up to 70% and 80% of connections can be dropped by the proposed method without any com Regarding claim 48, Ardakani teaches all the limitations and motivations of claim 39 in apparatus and/or product form rather than method form. Therefore, the supporting rationale of the rejection to claim 39 applies equally as well to those elements of claim 48. Claim 48 additionally recites “the system… comprises the processor or processor circuitry.” Han additionally teaches that the method is tested on the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor… and also benchmarked on GPUs using cuBLAS GEMV for the original dense layer, and used cuSPARSE CSRMV kernel for pruned layers (Han; §6.3 Speedup and Energy Efficiency).
Regarding claim 49, Ardakani teaches all the limitations and motivations of claim 40 in apparatus and/or product form rather than method form. Therefore, the supporting rationale of the rejection to claim 40 applies equally as well to those elements of claim 49. Claim 49 additionally recites “The system… comprises “the processor or processor circuitry.” Han additionally teaches that the method is tested on the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor… and also benchmarked on GPUs using cuBLAS GEMV for the original dense layer, and used cuSPARSE CSRMV kernel for pruned layers (Han; §6.3 Speedup and Energy Efficiency).
Regarding claim 55, Ardakani teaches all the limitations and motivations of claim 39 in apparatus and/or product form rather than method form. Therefore, the supporting rationale of the rejection to claim 39 applies equally as well to those elements of claim 55. Claim 55 additionally recites “the non-transitory machine readable medium.” Han additionally teaches that the method is tested on the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor… and also benchmarked on GPUs using cuBLAS GEMV for the original dense layer, and used cuSPARSE CSRMV kernel for pruned layers (Han; §6.3 Speedup and Energy Efficiency).
Regarding claim 56, Ardakani teaches all the limitations and motivations of claim 40 in apparatus and/or product form rather than method form. Therefore, the supporting rationale of the rejection to claim 40 applies equally as well to those elements of claim 56. Claim 56 additionally recites “the non-transitory machine readable medium.” Han additionally teaches that the method is tested on the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor… and also benchmarked on GPUs using cuBLAS GEMV for the original dense layer, and used cuSPARSE CSRMV kernel for pruned layers (Han; §6.3 Speedup and Energy Efficiency).
promise in performance.

Claims 43 is rejected under 35 U.S.C. 103 as being unpatentable over Han-Shamir-StackOverflow in view of Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey. “ImageNet Classification with Deep Convolutional Neural Networks” (2012) [hereinafter Krizhevsky].
Regarding claim 43¸ Kirzhevsky teaches wherein the sparsely connected deep neural network model is stored for use by an artificial intelligence processing application (architecture of a neural network with delineation of responsibilities between GPUs; Krizhevsky; Figure 2), the method further comprising: implementing the sparsely connected deep neural network model by the artificial intelligence processing application, wherein the artificial intelligence processing application receives input data for classification and classifies the input data to generate classification data (Input images from the ImageNet database for the system; Krizhevsky; §2 The Dataset), the artificial intelligence processing application comprising at least one of a computer vision application, a face recognition application, a face detection application, an object detection application, a gesture recognition application, a voice detection application, a voice identification application, or a speech to recognized series of textual elements application (AlexNet a network with 8 layers and trained on GPUs used for object recognition; Krizhevsky; §3 The Architecture, paragraph 1, §3.2 Training on Multiple GPUs, §1 Introduction).
Examiner notes that ImageNet ILSVRC-2012, similar to the ILSVRC-2010 dataset used by Krizhevsky, is a collection of over 1 million high-resolution images. Han teaches that ILSVRC-2012 has over 1.2M images and 50K validation examples. Since Han was able to use Deep Compression on the ImageNet ILSVRC-2012 dataset, Han and Krizhevsky’s teachings are analogous art in the same field of endeavor. Examiner further notes that a neural network capable of object detection (Krizhevsky; Figure 4) would also be capable of object detections as claimed by the invention.

Conclusion


Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC WU whose telephone number is (571)272-3380. The examiner can normally be reached Monday-Friday between 9AM and 6PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, OMAR FERNANDEZ RIVAS can be reached on (571)272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ERIC C WU/Examiner, Art Unit 2128                                                                                                                                                                                                        
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128