Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
The drawings are objected to because it appears that the drawings in figs 5B and 5C are color drawings or photographs and are not black and white line drawings. Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:



Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitation “the output” in line 8. There is insufficient antecedent basis for this limitation in the claim. It appears that “an output” in line 3 means “an output of the neural network” and “the output of the neural network” line 10 indicates “an output” in line 3. However, it appears that “the output” in line 8 does not indicate “an output” in line 3. For the purposes of examination, “an output” is used. In addition, claim 12 is rejected for the same reason.
Claim 6 recites the limitation “found nodes” in line 1. However, it is not clear if it indicates the nodes in the “finding” step of claim 1. If it does, it should read “the found nodes”. Otherwise, “another set of found nodes”, “a second set of found nodes”, “other found nodes” or something else may be used. For the purposes of examination, “the found nodes” is used. In addition, claim 17 is rejected for the same reason.
Claim 7 recites the limitation “the operation” in line 1. There is insufficient antecedent basis for this limitation in the claim. For the purposes of examination, “an operation” is used. In addition, claim 18 is rejected for the same reason.
Claim 9 recites the limitation “the requirement” in line 2. There is insufficient antecedent basis for this limitation in the claim. For the purposes of examination, “a requirement” is used.
Claim 12 recites “a computer”, “one or more sensors” and “a neural network application” and “a manipulation application”. However, it appears that the computer is not connected to the two applications, and thus it is hard for one of ordinary skill in the art to understand their relationship clearly. For at least this reason, one of ordinary skill in the art would not be able to draw a discernable boundary on what the metes and bounds of the claim is. Therefore, it 
Claim 13 recites the limitation “the obtained information” in line 5. There is insufficient antecedent basis for this limitation in the claim. For the purposes of examination, “the obtained data” is used.
The term “relevance” in claim 2 is a relative term which renders the claim indefinite. The term “relevance” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. For the purposes of examination, “relation” is used.
Claims 1-2, 6-7, 9, 12-13 and 17-18 each recite limitations that raise issues of indefiniteness as set forth above, and dependent claims 3-5, 8, 10-11, 14-16 and 19-20 are rejected at least based on their direct and/or indirect dependency from independent claims 1 and 12. Appropriate explanation and/or amendment is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-10 and 12-19 are rejected under 35 U.S.C. 103 as being unpatentable over Lotter et al. (DEEP PREDICTIVE CODING NETWORKS FOR VIDEO PREDICTION AND UNSUPERVISED LEARNING) in view of Liu et al. (Learning Efficient Convolutional Networks through Network Slimming)

Regarding claim 1
Lotter teaches
A method of controlling output of a neural network, the method comprising: 
receiving or training the neural network; 
(Lotter, [figs 1 and 4]; [sec 1] “We also find that our architecture can scale effectively to natural image sequences, by training using car-mounted camera videos. The network is able to successfully learn to predict both the movement of the camera and the movement of objects in the camera’s view”;)

wherein the neural network is an application executed on a computer that receives input from sensors and provides an output comprising predictions and/or decisions based on the input; 
(Lotter, [figs 1 and 4]; [sec 1] “We also find that our architecture can scale effectively to natural image sequences, by training using car-mounted camera videos. The network is able to successfully learn to predict both the movement of the camera and the movement of objects in the camera’s view”; [sec 1, p. 1] “Code and video examples can be found at: https://coxlab.github.io/prednet/”; [sec 3.2] “We next sought to test the PredNet architecture on complex, real-world sequences. As a testbed, we chose car-mounted camera videos, since these videos span across a wide range of settings and are characterized by rich temporal dynamics, including both self-motion of the vehicle and the motion of other objects in the scene (Agrawal et al., 2015). Models were trained using the raw videos from the KITTI dataset (Geiger et al., 2013), which were captured by a roof-mounted camera on a car driving around an urban environment in Germany.”; e.g., “code” may read on “neural network is an application executed on a computer” since the code runs on a computer.)

(Note: Hereinafter, if a limitation has brackets (i.e. [ ]) around claim languages, the bracketed claim languages indicate that they have not been taught yet by the current prior art reference but they will be taught by another prior art reference afterwards.)

[identifying] a region of the neural network that contains information of interest;
(Lotter, [figs 1 and 4]; [sec 1] “We also find that our architecture can scale effectively to natural image sequences, by training using car-mounted camera videos. The network is able to successfully learn to predict both the movement of the camera and the movement of objects in the camera’s view”; e.g., “objects in the camera’s view” along with “network” may read on “a region of the neural network that contains information of interest”.)


(Lotter, [figs 1 and 4] “Predictive Coding Network (PredNet). Left: Illustration of information flow within two layers. Each layer consists of representation neurons (Rl), which output a layer-specific prediction at each time step (Aˆl), which is compared against a target (Al) (Bengio, 2014) to produce an error term (El), which is then propagated laterally and vertically in the network. Right: Module operations for case of video sequences.”; [sec 1] “We also find that our architecture can scale effectively to natural image sequences, by training using car-mounted camera videos. The network is able to successfully learn to predict both the movement of the camera and the movement of objects in the camera’s view”; e.g., “objects in the camera’s view” along with “network” may read on “within the [identified] region a specific node or group of nodes that contain specific information of interest”.)

applying a manipulation application external to the neural network to operate on and alter the output of the [specific] node or group of nodes within the neural network; 
wherein the altered output of the [specific] node affects the output of the neural network without altering the input of the neural network.
(Lotter, [figs 1 and 4] “Predictive Coding Network (PredNet). Left: Illustration of information flow within two layers. Each layer consists of representation neurons (Rl), which output a layer-specific prediction at each time step (Aˆl), which is compared against a target (Al) (Bengio, 2014) to produce an error term (El), which is then propagated laterally and vertically in the network. Right: Module operations for case of video sequences.”; [sec 1] “We also find that our architecture can scale effectively to natural image sequences, by training using car-mounted camera videos. The network is able to successfully learn to predict both the movement of the camera and the movement of objects in the camera’s view”; [sec 3.2, pp. 7-8] “A random hyperparameter search, with model selection based on the validation set, resulted in a 4 layer model with 3x3 convolutions and layer channel sizes of (3, 48, 96, 192). Models were again trained with Adam (Kingma & Ba, 2014) using a loss either solely computed on the lowest layer (L0) or with a weight of 1 on the lowest layer and 0.1 on the upper layers (Lall). … In the top sequence of Fig. 4, a car is passing in the opposite direction, and the model, while not perfect, is able to predict its trajectory, as well as fill in the ground it leaves behind. Similarly in Sequence 3, the model is able to predict the motion of a vehicle completing a left turn. Sequences 2 and 5 illustrate that the PredNet can judge its own movement, as it predicts the appearance of shadows and a stationary vehicle as they approach. The model makes reasonable predictions even in difficult scenarios, such as when the camera-mounted vehicle is turning. In Sequence 4, the model predicts the position of a tree, as the vehicle turns onto a road. The turning sequences also further illustrate the model’s ability to “fill-in”, as it is able to extrapolate sky and tree textures as unseen regions come into view”; e.g., “random hyperparameter search, with model selection based on the validation set” and “Models were again trained with Adam” and prediction/image outputs may read on “manipulation application”.)

However, Lotter does not teach
[identifying] a region of the neural network that contains information of interest;
[finding] within the [identified] region a [specific] node or group of nodes that contain specific information of interest; and 
applying a manipulation application external to the neural network to operate on and alter the output of the [specific] node or group of nodes within the neural network;
wherein the altered output of the [specific] node affects the output of the neural network without altering the input of the neural network.

(Note: Hereinafter, if a limitation has one or more underlines, the one or more underlined claim languages indicate that they are taught by the current prior art reference, while the one or more non-underlined claim languages indicate that they have been taught already by one or more previous art references.)

Liu teaches 
In the alternative, Liu can also be interpreted to teach the following limitation:
receiving or training the neural network; 
(Liu, [figs 1-2] “initial network”)

identifying a region of the neural network that contains information of interest;
(Liu, [figs 1-2] “We associate a scaling factor (reused from a batch normalization layer) with each channel in convolutional layers. Sparsity regularization is imposed on these scaling factors during training to automatically identify unimportant channels. The channels with small scaling factor values (in orange color) will be pruned (left side). After pruning, we obtain compact models (right side), which are then fine-tuned to achieve comparable (or even higher) accuracy as normally trained full network”; [sec 1] “Pushing the values of BN scaling factors towards zero with L1 regularization enables us to identify insignificant channels (or neurons), as each scaling factor corresponds to a specific convolutional channel (or a neuron in a fully-connected layer).”; e.g., “compact models” along with “pruning” may read on “identifying a region of the neural network”. Note that Lotter teaches “[identifying] a region of the neural network that contains information of interest”.)

finding within the identified region a specific node or group of nodes that contain specific information of interest; and 
(Liu, [fig 1] “We associate a scaling factor (reused from a batch normalization layer) with each channel in convolutional layers. Sparsity regularization is imposed on these scaling factors during training to automatically identify unimportant channels. The channels with small scaling factor values (in orange color) will be pruned (left side). After pruning, we obtain compact models (right side), which are then fine-tuned to achieve comparable (or even higher) accuracy as normally trained full network”; [fig 2] “Fine-tune the pruned network”; [sec 1] “Pushing the values of BN scaling factors towards zero with L1 regularization enables us to identify insignificant channels (or neurons), as each scaling factor corresponds to a specific convolutional channel (or a neuron in a fully-connected layer).”; [sec 3] “Finally we prune those channels with small factors, and fine-tune the pruned network.”; e.g., “Fine-tune the pruned network” may read on “finding within the identified region a specific node or group of nodes” since each node is fine-tuned. Note that Lotter teaches “[finding] within the [identified] region a [specific] node or group of nodes that contain specific information of interest”.)

applying a manipulation application external to the neural network to operate on and alter the output of the specific node or group of nodes within the neural network; 
wherein the altered output of the specific node affects the output of the neural network without altering the input of the neural network.
(Liu, [fig 1] “We associate a scaling factor (reused from a batch normalization layer) with each channel in convolutional layers. Sparsity regularization is imposed on these scaling factors during training to automatically identify unimportant channels. The channels with small scaling factor values (in orange color) will be pruned (left side). After pruning, we obtain compact models (right side), which are then fine-tuned to achieve comparable (or even higher) accuracy as normally trained full network”; [fig 2] “Fine-tune the pruned network”; [sec 1] “Pushing the values of BN scaling factors towards zero with L1 regularization enables us to identify insignificant channels (or neurons), as each scaling factor corresponds to a specific convolutional channel (or a neuron in a fully-connected layer).”; [sec 3, p. 2739] “In our experiments, the fine-tuned narrow network can even achieve higher accuracy than the original unpruned network in many cases. Multi-pass Scheme. We can also extend the proposed method from single-pass learning scheme (training with sparsity regularization, pruning, and fine-tuning) to a multi-pass scheme.” [sec 4, pp. 2740-2741] “We empirically demonstrate the effectiveness of network slimming on several benchmark datasets. We implement our method based on the publicly available Torch [5] implementation for ResNets by [10]. The code is available at https://github.com/liuzhuang13/slimming. ... From Table 1, we can observe that, on ResNet and DenseNet, typically when 40% channels are pruned, the fine-tuned network can achieve a lower test error than the original models.”; e.g., “fine-tuned to achieve comparable (or even higher) accuracy as normally trained full network” along with multi-pass scheme may read on “alter the output of the specific node or group of nodes within the neural network” as well. In addition, e.g., “train”, “fine-tune” and “multi-pass” may read on “manipulation application” as well. Note that Lotter teaches “applying a manipulation application external to the neural network to operate on and alter the output of the [specific] node or group of nodes within the neural network; wherein the altered output of the [specific] node affects the output of the neural network without altering the input of the neural network”.)

Lotter and Liu are all in the same field of endeavor of processing input signal with the neural network system and are analogous. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Lotter with the neural network region identification of Liu. 
Doing so would lead to providing a simple yet effective network scheme, which addresses challenges when deploying large CNNs under limited resources.
(Liu, [sec 1] “In this paper, we propose network slimming, a simple yet effective network training scheme, which addresses all the aforementioned challenges when deploying large CNNs under limited resources.”).

Regarding claim 2
Lotter and Liu teaches claim 1. 

identifying a region comprises: (see claim 1)

Lotter further teaches 
obtaining data from a plurality of locations in the neural network, while the neural network is processing an input data stream from the sensors; and 
(Lotter, [figs 1 and 4] [sec 1] “We also find that our architecture can scale effectively to natural image sequences, by training using car-mounted camera videos. The network is able to successfully learn to predict both the movement of the camera and the movement of objects in the camera’s view” [sec 1, p. 1] “Code and video examples can be found at: https://coxlab.github.io/prednet/” [sec 3.2] “We next sought to test the PredNet architecture on complex, real-world sequences. As a testbed, we chose car-mounted camera videos, since these videos span across a wide range of settings and are characterized by rich temporal dynamics, including both self-motion of the vehicle and the motion of other objects in the scene (Agrawal et al., 2015). Models were trained using the raw videos from the KITTI dataset (Geiger et al., 2013), which were captured by a roof-mounted camera on a car driving around an urban environment in Germany”.)

analyzing relevance of the data.
(Lotter, [figs 1 and 4] “Error” [sec 1] “We also find that our architecture can scale effectively to natural image sequences, by training using car-mounted camera videos. The network is able to successfully learn to predict both the movement of the camera and the movement of objects in the camera’s view” [sec 2] “The network takes the difference between Al and Aˆl and outputs an error representation, El, which is split into separate rectified positive and negative error populations.”; e.g., “error” and/or “the movement of the camera and the movement of objects” along with fig 4 may read on “analyzing relevance of the data”.)

In the alternative, Liu can also be interpreted to teach the following limitation:
Liu further teaches
analyzing relevance of the data.
(Liu, [figs 1-2] [sec 3] “Our idea is introducing a scaling factor γ for each channel, which is multiplied to the output of that channel. Then we jointly train the network weights and these scaling factors, with sparsity regularization imposed on the latter. Finally we prune those channels with small factors, and fine-tune the pruned network.  … For instance, we prune 70% channels with lower scaling factors by choosing the percentile threshold as 70%”.)

Lotter and Liu are combinable with Liu for the same rationale as set forth above with respect to claim 1.

Regarding claim 3
Lotter and Liu teaches claim 1. 

Liu further teaches 

(Liu, [figs 1-2] “We associate a scaling factor (reused from a batch normalization layer) with each channel in convolutional layers. Sparsity regularization is imposed on these scaling factors during training to automatically identify unimportant channels. The channels with small scaling factor values (in orange color) will be pruned (left side). After pruning, we obtain compact models (right side), which are then fine-tuned to achieve comparable (or even higher) accuracy as normally trained full network”; [sec 1] “Pushing the values of BN scaling factors towards zero with L1 regularization enables us to identify insignificant channels (or neurons), as each scaling factor corresponds to a specific convolutional channel (or a neuron in a fully-connected layer).” (Liu, [figs 1-2]; [sec 4] “We empirically demonstrate the effectiveness of network slimming on several benchmark datasets. We implement our method based on the publicly available Torch [5] implementation for ResNets by [10]. The code is available at https://github.com/liuzhuang13/slimming.”; e.g., “We implement our method based on the publicly available Torch [5] implementation for ResNets by [10]” and “code” may read on “receiving instructions via a communication network and/or user interface” since a user runs the “code” on a computer through an user interface. In addition, e.g., “compact models” along with “pruning” may read on “identifying the region of the neural network”.)

Lotter and Liu are combinable with Liu for the same rationale as set forth above with respect to claim 1.

Regarding claim 4
Lotter and Liu teaches claim 1. 

Liu further teaches 
operating on includes extracting information from the specific node or group of nodes.
(Liu, [fig 1] “We associate a scaling factor (reused from a batch normalization layer) with each channel in convolutional layers. Sparsity regularization is imposed on these scaling factors during training to automatically identify unimportant channels. The channels with small scaling factor values (in orange color) will be pruned (left side). After pruning, we obtain compact models (right side), which are then fine-tuned to achieve comparable (or even higher) accuracy as normally trained full network”; [fig 2] “Fine-tune the pruned network”; [sec 1] “Pushing the values of BN scaling factors towards zero with L1 regularization enables us to identify insignificant channels (or neurons), as each scaling factor corresponds to a specific convolutional channel (or a neuron in a fully-connected layer).”; [sec 3, p. 2739] “In our experiments, the fine-tuned narrow network can even achieve higher accuracy than the original unpruned network in many cases” [sec 4, pp. 2740-2741] “We empirically demonstrate the effectiveness of network slimming on several benchmark datasets. We implement our method based on the publicly available Torch [5] implementation for ResNets by [10]. The code is available at https://github.com/liuzhuang13/slimming. ... From Table 1, we can observe that, on ResNet and DenseNet, typically when 40% channels are pruned, the fine-tuned network can achieve a lower test error than the original models.”;)

Lotter and Liu are combinable with Liu for the same rationale as set forth above with respect to claim 1.

Regarding claim 5
Lotter and Liu teaches claim 1. 

Liu further teaches 
operating on includes changing, replacing, or otherwise controlling the mathematical operators executed in the nodes.
(Liu, [fig 1] “We associate a scaling factor (reused from a batch normalization layer) with each channel in convolutional layers. Sparsity regularization is imposed on these scaling factors during training to automatically identify unimportant channels. The channels with small scaling factor values (in orange color) will be pruned (left side). After pruning, we obtain compact models (right side), which are then fine-tuned to achieve comparable (or even higher) accuracy as normally trained full network”; [fig 2] “Fine-tune the pruned network”; [sec 3] “Our idea is introducing a scaling factor γ for each channel, which is multiplied to the output of that channel. Then we jointly train the network weights and these scaling factors, with sparsity regularization imposed on the latter. Finally we prune those channels with small factors, and fine-tune the pruned network.  … For instance, we prune 70% channels with lower scaling factors by choosing the percentile threshold as 70%”. e.g., the training and fine-tuning may read on “controlling the mathematical operators executed in the nodes”.)

Lotter and Liu are combinable with Liu for the same rationale as set forth above with respect to claim 1.

Regarding claim 6
Lotter and Liu teaches claim 1. 

Liu further teaches 

(Liu, [fig 1] “We associate a scaling factor (reused from a batch normalization layer) with each channel in convolutional layers. Sparsity regularization is imposed on these scaling factors during training to automatically identify unimportant channels. The channels with small scaling factor values (in orange color) will be pruned (left side). After pruning, we obtain compact models (right side), which are then fine-tuned to achieve comparable (or even higher) accuracy as normally trained full network”; [fig 2] “Fine-tune the pruned network”; [sec 1] “Pushing the values of BN scaling factors towards zero with L1 regularization enables us to identify insignificant channels (or neurons), as each scaling factor corresponds to a specific convolutional channel (or a neuron in a fully-connected layer).”; [sec 3, p. 2739] “In our experiments, the fine-tuned narrow network can even achieve higher accuracy than the original unpruned network in many cases” [sec 4, pp. 2740-2741] “The Street View House Number (SVHN) dataset [27] consists of 32x32 colored digit images. … MNIST is a handwritten digit dataset containing 60,000 training images and 10,000 test images.”; e.g., “Street View House Number (SVHN) dataset” and “MNIST is a handwritten digit dataset” may read on “elements with a certain combination of properties”.)

Lotter and Liu are combinable with Liu for the same rationale as set forth above with respect to claim 1.

Regarding claim 7
Lotter and Liu teaches claim 1. 

Lotter further teaches 

(Lotter, [figs 1 and 4] [sec 1] “We also find that our architecture can scale effectively to natural image sequences, by training using car-mounted camera videos. The network is able to successfully learn to predict both the movement of the camera and the movement of objects in the camera’s view”; [sec 3.2, pp. 7-8] “In the top sequence of Fig. 4, a car is passing in the opposite direction, and the model, while not perfect, is able to predict its trajectory, as well as fill in the ground it leaves behind. Similarly in Sequence 3, the model is able to predict the motion of a vehicle completing a left turn. Sequences 2 and 5 illustrate that the PredNet can judge its own movement, as it predicts the appearance of shadows and a stationary vehicle as they approach. The model makes reasonable predictions even in difficult scenarios, such as when the camera-mounted vehicle is turning. In Sequence 4, the model predicts the position of a tree, as the vehicle turns onto a road. The turning sequences also further illustrate the model’s ability to “fill-in”, as it is able to extrapolate sky and tree textures as unseen regions come into view”; e.g., “fill-in” may read on “image filling”. The examiner notes that this claim is a kind of Markush-type and the group “image filling” is elected for examination.)

Regarding claim 8
Lotter and Liu teaches claim 1. 

Liu further teaches 
generating multiple instances of the identified region and selectively controlling the instances to obtain a desired output.
(Liu, [table 4] [figs 1-2] “We associate a scaling factor (reused from a batch normalization layer) with each channel in convolutional layers. Sparsity regularization is imposed on these scaling factors during training to automatically identify unimportant channels. The channels with small scaling factor values (in orange color) will be pruned (left side). After pruning, we obtain compact models (right side), which are then fine-tuned to achieve comparable (or even higher) accuracy as normally trained full network”; [sec 1] “Pushing the values of BN scaling factors towards zero with L1 regularization enables us to identify insignificant channels (or neurons), as each scaling factor corresponds to a specific convolutional channel (or a neuron in a fully-connected layer).” [sec 3] “Multi-pass Scheme. We can also extend the proposed method from single-pass learning scheme (training with sparsity regularization, pruning, and fine-tuning) to a multi-pass scheme. Specifically, a network slimming procedure results in a narrow network, on which we could again apply the whole training procedure to learn an even more compact model. This is illustrated by the dotted-line in Figure 2. Experimental results show that this multi-pass scheme can lead to even better results in terms of compression rate” [sec 4, pp. 2740-2741] “From Figure 5, it can be concluded that the classification performance of the pruned or fine-tuned models degrade only when the pruning ratio surpasses a threshold”; e.g., “compact models” along with “pruning” may read on “identified region”. In addition, e.g., “network slimming procedure results in a narrow network, on which we could again apply the whole training procedure to learn an even more compact model” along with table 4 may read on “selectively controlling the instances to obtain a desired output” since each instance is used for different testing as shown in table 4. Furthermore, e.g., “classification” may read on “desired output”.)

Lotter and Liu are combinable with Liu for the same rationale as set forth above with respect to claim 1.

Regarding claim 9


operating on comprises calculating a gradient for each node representing the requirement for altering activity of the node to obtain a desired output of the neural network and applying the calculated gradients on the nodes.
(Lotter, [figs 1 and 4] “Predictive Coding Network (PredNet). Left: Illustration of information flow within two layers. Each layer consists of representation neurons (Rl), which output a layer-specific prediction at each time step (Aˆl), which is compared against a target (Al) (Bengio, 2014) to produce an error term (El), which is then propagated laterally and vertically in the network. Right: Module operations for case of video sequences.”; [sec 1] “We also find that our architecture can scale effectively to natural image sequences, by training using car-mounted camera videos. The network is able to successfully learn to predict both the movement of the camera and the movement of objects in the camera’s view”; [sec 3.2, pp. 7-8] “A random hyperparameter search, with model selection based on the validation set, resulted in a 4 layer model with 3x3 convolutions and layer channel sizes of (3, 48, 96, 192). Models were again trained with Adam (Kingma & Ba, 2014) using a loss either solely computed on the lowest layer (L0) or with a weight of 1 on the lowest layer and 0.1 on the upper layers (Lall)” [sec 2] “The architecture described here is inspired by that originally proposed by (Rao & Ballard, 1999), but is formulated in a modern deep learning framework and trained end-to-end using gradient descent, with a loss function implicitly embedded in the network as the firing rates of the error neurons.”)

Regarding claim 10
Lotter and Liu teaches claim 1. 


operating on comprises setting a desired value in a specific node.
(Liu, [fig 1] “We associate a scaling factor (reused from a batch normalization layer) with each channel in convolutional layers. Sparsity regularization is imposed on these scaling factors during training to automatically identify unimportant channels. The channels with small scaling factor values (in orange color) will be pruned (left side). After pruning, we obtain compact models (right side), which are then fine-tuned to achieve comparable (or even higher) accuracy as normally trained full network”; [fig 2] “Fine-tune the pruned network”; [sec 1] “Pushing the values of BN scaling factors towards zero with L1 regularization enables us to identify insignificant channels (or neurons), as each scaling factor corresponds to a specific convolutional channel (or a neuron in a fully-connected layer).”; [sec 3] “Finally we prune those channels with small factors, and fine-tune the pruned network.”; e.g., “Fine-tune the pruned network” may read on “setting a desired value in a specific node” since each node is set with a desired value during the fine-tuning.)

Lotter and Liu are combinable with Liu for the same rationale as set forth above with respect to claim 1.

Regarding claim 12
Lotter teaches
A system for generating an alternative output of a neural network, the system comprising: 
a computer including a processor and memory; 
(Lotter, [figs 1 and 4]; [sec 1, p. 1] “Code and video examples can be found at: https://coxlab.github.io/prednet/”; e.g., “code” may read on “computer including a processor and memory” since the code runs on a computer.)

one or more sensors for providing a data stream as input to the computer; 
(Lotter, [figs 1 and 4]; [sec 1] “We also find that our architecture can scale effectively to natural image sequences, by training using car-mounted camera videos. The network is able to successfully learn to predict both the movement of the camera and the movement of objects in the camera’s view”; [sec 3.2] “We next sought to test the PredNet architecture on complex, real-world sequences. As a testbed, we chose car-mounted camera videos, since these videos span across a wide range of settings and are characterized by rich temporal dynamics, including both self-motion of the vehicle and the motion of other objects in the scene (Agrawal et al., 2015). Models were trained using the raw videos from the KITTI dataset (Geiger et al., 2013), which were captured by a roof-mounted camera on a car driving around an urban environment in Germany.”)

a neural network application; 
(Lotter, [figs 1 and 4] “Predictive Coding Network (PredNet)”; [sec 1] “We also find that our architecture can scale effectively to natural image sequences, by training using car-mounted camera videos. The network is able to successfully learn to predict both the movement of the camera and the movement of objects in the camera’s view”;)

wherein the neural network application receives the input from the sensors and provides an output comprising predictions and/or decisions based on the input;                   
(Lotter, [figs 1 and 4]; [sec 1] “We also find that our architecture can scale effectively to natural image sequences, by training using car-mounted camera videos. The network is able to successfully learn to predict both the movement of the camera and the movement of objects in the camera’s view”; [sec 1, p. 1] “Code and video examples can be found at: https://coxlab.github.io/prednet/”; e.g., “code” may read on “neural network application” since the code runs on a computer.)

a manipulation application external to the neural network application; 
(Lotter, [figs 1 and 4]; [sec 3.2, pp. 7-8] “A random hyperparameter search, with model selection based on the validation set, resulted in a 4 layer model with 3x3 convolutions and layer channel sizes of (3, 48, 96, 192). Models were again trained with Adam (Kingma & Ba, 2014) using a loss either solely computed on the lowest layer (L0) or with a weight of 1 on the lowest layer and 0.1 on the upper layers (Lall). … In the top sequence of Fig. 4, a car is passing in the opposite direction, and the model, while not perfect, is able to predict its trajectory, as well as fill in the ground it leaves behind. Similarly in Sequence 3, the model is able to predict the motion of a vehicle completing a left turn. Sequences 2 and 5 illustrate that the PredNet can judge its own movement, as it predicts the appearance of shadows and a stationary vehicle as they approach. The model makes reasonable predictions even in difficult scenarios, such as when the camera-mounted vehicle is turning. In Sequence 4, the model predicts the position of a tree, as the vehicle turns onto a road. The turning sequences also further illustrate the model’s ability to “fill-in”, as it is able to extrapolate sky and tree textures as unseen regions come into view”; e.g., “random hyperparameter search, with model selection based on the validation set” and “Models were again trained with Adam” and prediction/image outputs may read on “manipulation application”.)

wherein the manipulation application is configured to perform: 
[identifying] a region of the neural network that contains information of interest; 
(Lotter, [figs 1 and 4]; [sec 1] “We also find that our architecture can scale effectively to natural image sequences, by training using car-mounted camera videos. The network is able to successfully learn to predict both the movement of the camera and the movement of objects in the camera’s view”; e.g., “objects in the camera’s view” along with “network” may read on “a region of the neural network that contains information of interest”.)

[finding] within the [identified] region a [specific] node or group of nodes that contain specific information of interest; and 
(Lotter, [figs 1 and 4] “Predictive Coding Network (PredNet). Left: Illustration of information flow within two layers. Each layer consists of representation neurons (Rl), which output a layer-specific prediction at each time step (Aˆl), which is compared against a target (Al) (Bengio, 2014) to produce an error term (El), which is then propagated laterally and vertically in the network. Right: Module operations for case of video sequences.”; [sec 1] “We also find that our architecture can scale effectively to natural image sequences, by training using car-mounted camera videos. The network is able to successfully learn to predict both the movement of the camera and the movement of objects in the camera’s view”; e.g., “objects in the camera’s view” along with “network” may read on “within the [identified] region a specific node or group of nodes that contain specific information of interest”.)

operating on and altering the output of the [specific] node or group of nodes within the neural network: 
wherein the altered output of the [specific] node affects the output of the neural network without altering the input of the neural network.
(Lotter, [figs 1 and 4] “Predictive Coding Network (PredNet). Left: Illustration of information flow within two layers. Each layer consists of representation neurons (Rl), which output a layer-specific prediction at each time step (Aˆl), which is compared against a target (Al) (Bengio, 2014) to produce an error term (El), which is then propagated laterally and vertically in the network. Right: Module operations for case of video sequences.”; [sec 1] “We also find that our architecture can scale effectively to natural image sequences, by training using car-mounted camera videos. The network is able to successfully learn to predict both the movement of the camera and the movement of objects in the camera’s view”; [sec 3.2, pp. 7-8] “A random hyperparameter search, with model selection based on the validation set, resulted in a 4 layer model with 3x3 convolutions and layer channel sizes of (3, 48, 96, 192). Models were again trained with Adam (Kingma & Ba, 2014) using a loss either solely computed on the lowest layer (L0) or with a weight of 1 on the lowest layer and 0.1 on the upper layers (Lall). … In the top sequence of Fig. 4, a car is passing in the opposite direction, and the model, while not perfect, is able to predict its trajectory, as well as fill in the ground it leaves behind. Similarly in Sequence 3, the model is able to predict the motion of a vehicle completing a left turn. Sequences 2 and 5 illustrate that the PredNet can judge its own movement, as it predicts the appearance of shadows and a stationary vehicle as they approach. The model makes reasonable predictions even in difficult scenarios, such as when the camera-mounted vehicle is turning. In Sequence 4, the model predicts the position of a tree, as the vehicle turns onto a road. The turning sequences also further illustrate the model’s ability to “fill-in”, as it is able to extrapolate sky and tree textures as unseen regions come into view”; e.g., “random hyperparameter search, with model selection based on the validation set” and “Models were again trained with Adam” may read on “manipulation application”.)


[identifying] a region of the neural network that contains information of interest; 
[finding] within the [identified] region a [specific] node or group of nodes that contain specific information of interest; and 
operating on and altering the output of the [specific] node or group of nodes within the neural network: 
wherein the altered output of the [specific] node affects the output of the neural network without altering the input of the neural network.

Liu teaches 
identifying a region of the neural network that contains information of interest;
(Liu, [figs 1-2] “We associate a scaling factor (reused from a batch normalization layer) with each channel in convolutional layers. Sparsity regularization is imposed on these scaling factors during training to automatically identify unimportant channels. The channels with small scaling factor values (in orange color) will be pruned (left side). After pruning, we obtain compact models (right side), which are then fine-tuned to achieve comparable (or even higher) accuracy as normally trained full network”; [sec 1] “Pushing the values of BN scaling factors towards zero with L1 regularization enables us to identify insignificant channels (or neurons), as each scaling factor corresponds to a specific convolutional channel (or a neuron in a fully-connected layer).”; e.g., “compact models” along with “pruning” may read on “identifying a region of the neural network”. Note that Lotter teaches “[identifying] a region of the neural network that contains information of interest”.)

finding within the identified region a specific node or group of nodes that contain specific information of interest; and 
(Liu, [fig 1] “We associate a scaling factor (reused from a batch normalization layer) with each channel in convolutional layers. Sparsity regularization is imposed on these scaling factors during training to automatically identify unimportant channels. The channels with small scaling factor values (in orange color) will be pruned (left side). After pruning, we obtain compact models (right side), which are then fine-tuned to achieve comparable (or even higher) accuracy as normally trained full network”; [fig 2] “Fine-tune the pruned network”; [sec 1] “Pushing the values of BN scaling factors towards zero with L1 regularization enables us to identify insignificant channels (or neurons), as each scaling factor corresponds to a specific convolutional channel (or a neuron in a fully-connected layer).”; [sec 3] “Finally we prune those channels with small factors, and fine-tune the pruned network.”; e.g., “Fine-tune the pruned network” may read on “finding within the identified region a specific node or group of nodes” since each node is fine-tuned. Note that Lotter teaches “[finding] within the [identified] region a [specific] node or group of nodes that contain specific information of interest”.)

operating on and altering the output of the specific node or group of nodes within the neural network;
wherein the altered output of the specific node affects the output of the neural network without altering the input of the neural network.
(Liu, [fig 1] “We associate a scaling factor (reused from a batch normalization layer) with each channel in convolutional layers. Sparsity regularization is imposed on these scaling factors during training to automatically identify unimportant channels. The channels with small scaling factor values (in orange color) will be pruned (left side). After pruning, we obtain compact models (right side), which are then fine-tuned to achieve comparable (or even higher) accuracy as normally trained full network”; [fig 2] “Fine-tune the pruned network”; [sec 1] “Pushing the values of BN scaling factors towards zero with L1 regularization enables us to identify insignificant channels (or neurons), as each scaling factor corresponds to a specific convolutional channel (or a neuron in a fully-connected layer).”; [sec 3, p. 2739] “In our experiments, the fine-tuned narrow network can even achieve higher accuracy than the original unpruned network in many cases. Multi-pass Scheme. We can also extend the proposed method from single-pass learning scheme (training with sparsity regularization, pruning, and fine-tuning) to a multi-pass scheme” [sec 4, pp. 2740-2741] “We empirically demonstrate the effectiveness of network slimming on several benchmark datasets. We implement our method based on the publicly available Torch [5] implementation for ResNets by [10]. The code is available at https://github.com/liuzhuang13/slimming. ... From Table 1, we can observe that, on ResNet and DenseNet, typically when 40% channels are pruned, the fine-tuned network can achieve a lower test error than the original models.”; e.g., “fine-tuned to achieve comparable (or even higher) accuracy as normally trained full network” along with multi-pass scheme may read on “alter the output of the specific node or group of nodes within the neural network” as well. In addition, e.g., “train”, “fine-tune” and “multi-pass” may read on “manipulation application” as well. Note that Lotter teaches “operating on and altering the output of the [specific] node or group of nodes within the neural network; wherein the altered output of the [specific] node affects the output of the neural network without altering the input of the neural network”.)

Lotter and Liu are combinable with Liu for the same rationale as set forth above with respect to claim 1.

Regarding claim 13
Lotter and Liu teaches claim 12. 
the manipulation application is further configured to: (see claim 1)

Lotter further teaches 
obtain data from a plurality of locations in the neural network, while the neural network is processing the input data stream; and 
(Lotter, [figs 1 and 4] [sec 1] “We also find that our architecture can scale effectively to natural image sequences, by training using car-mounted camera videos. The network is able to successfully learn to predict both the movement of the camera and the movement of objects in the camera’s view” [sec 1, p. 1] “Code and video examples can be found at: https://coxlab.github.io/prednet/” [sec 3.2] “We next sought to test the PredNet architecture on complex, real-world sequences. As a testbed, we chose car-mounted camera videos, since these videos span across a wide range of settings and are characterized by rich temporal dynamics, including both self-motion of the vehicle and the motion of other objects in the scene (Agrawal et al., 2015). Models were trained using the raw videos from the KITTI dataset (Geiger et al., 2013), which were captured by a roof-mounted camera on a car driving around an urban environment in Germany”.)

[identify] based on the obtained information a region of the neural network that contains information of interest.
(Lotter, [figs 1 and 4]; [sec 1] “We also find that our architecture can scale effectively to natural image sequences, by training using car-mounted camera videos. The network is able to successfully learn to predict both the movement of the camera and the movement of objects in the camera’s view”; e.g., “objects in the camera’s view” along with “network” may read on “a region of the neural network that contains information of interest”.)


identify based on the obtained information a region of the neural network that contains information of interest.
(Liu, [figs 1-2] “We associate a scaling factor (reused from a batch normalization layer) with each channel in convolutional layers. Sparsity regularization is imposed on these scaling factors during training to automatically identify unimportant channels. The channels with small scaling factor values (in orange color) will be pruned (left side). After pruning, we obtain compact models (right side), which are then fine-tuned to achieve comparable (or even higher) accuracy as normally trained full network”; [sec 1] “Pushing the values of BN scaling factors towards zero with L1 regularization enables us to identify insignificant channels (or neurons), as each scaling factor corresponds to a specific convolutional channel (or a neuron in a fully-connected layer).” [sec 3] “Our idea is introducing a scaling factor γ for each channel, which is multiplied to the output of that channel. Then we jointly train the network weights and these scaling factors, with sparsity regularization imposed on the latter. Finally we prune those channels with small factors, and fine-tune the pruned network.  … For instance, we prune 70% channels with lower scaling factors by choosing the percentile threshold as 70%”; e.g., “compact models” along with “pruning” and “train” may read on “identify based on the obtained information a region of the neural network”. Note that Lotter teaches “[identify] based on the obtained information a region of the neural network that contains information of interest.)

Lotter and Liu are combinable with Liu for the same rationale as set forth above with respect to claim 1.

Regarding claim 14
Lotter and Liu teaches claim 12. 

Liu further teaches 
the manipulation application is further configured to receive instructions via a communication network and/or user interface and based on the instructions dynamically identifying the region of the neural network that contains information of interest.
(Liu, [figs 1-2] “We associate a scaling factor (reused from a batch normalization layer) with each channel in convolutional layers. Sparsity regularization is imposed on these scaling factors during training to automatically identify unimportant channels. The channels with small scaling factor values (in orange color) will be pruned (left side). After pruning, we obtain compact models (right side), which are then fine-tuned to achieve comparable (or even higher) accuracy as normally trained full network”; [sec 1] “Pushing the values of BN scaling factors towards zero with L1 regularization enables us to identify insignificant channels (or neurons), as each scaling factor corresponds to a specific convolutional channel (or a neuron in a fully-connected layer).” (Liu, [figs 1-2]; [sec 4] “We empirically demonstrate the effectiveness of network slimming on several benchmark datasets. We implement our method based on the publicly available Torch [5] implementation for ResNets by [10]. The code is available at https://github.com/liuzhuang13/slimming.”; e.g., “We implement our method based on the publicly available Torch [5] implementation for ResNets by [10]” and “code” may read on “receiving instructions via a communication network and/or user interface” since a user runs the “code” on a computer through an user interface. In addition, e.g., “compact models” along with “pruning” may read on “identifying the region of the neural network”. Note that Lotter teaches “manipulation application”.)

Lotter and Liu are combinable with Liu for the same rationale as set forth above with respect to claim 1.

Regarding claim 15
Lotter and Liu teaches claim 12. 

Claim 15 is a system claim corresponding to the method claim 4, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 4.

Regarding claim 16
Lotter and Liu teaches claim 12. 

Claim 16 is a system claim corresponding to the method claim 5, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 5.

Regarding claim 17
Lotter and Liu teaches claim 12. 

Liu further teaches 
the manipulation application is further configured to operate on a combination of found nodes to extract information from or manipulate elements with a certain combination of properties.
(Liu, [fig 1] “We associate a scaling factor (reused from a batch normalization layer) with each channel in convolutional layers. Sparsity regularization is imposed on these scaling factors during training to automatically identify unimportant channels. The channels with small scaling factor values (in orange color) will be pruned (left side). After pruning, we obtain compact models (right side), which are then fine-tuned to achieve comparable (or even higher) accuracy as normally trained full network”; [fig 2] “Fine-tune the pruned network”; [sec 1] “Pushing the values of BN scaling factors towards zero with L1 regularization enables us to identify insignificant channels (or neurons), as each scaling factor corresponds to a specific convolutional channel (or a neuron in a fully-connected layer).”; [sec 3, p. 2739] “In our experiments, the fine-tuned narrow network can even achieve higher accuracy than the original unpruned network in many cases” [sec 4, pp. 2740-2741] “The Street View House Number (SVHN) dataset [27] consists of 32x32 colored digit images. … MNIST is a handwritten digit dataset containing 60,000 training images and 10,000 test images.”; e.g., “Street View House Number (SVHN) dataset” and “MNIST is a handwritten digit dataset” may read on “elements with a certain combination of properties”. Note that Lotter teaches “manipulation application”.)

Lotter and Liu are combinable with Liu for the same rationale as set forth above with respect to claim 1.

Regarding claim 18
Lotter and Liu teaches claim 12. 
Claim 18 is a system claim corresponding to the method claim 7, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 7.

Regarding claim 19
Lotter and Liu teaches claim 12. 

Liu further teaches 

(Liu, [table 4] [figs 1-2] “We associate a scaling factor (reused from a batch normalization layer) with each channel in convolutional layers. Sparsity regularization is imposed on these scaling factors during training to automatically identify unimportant channels. The channels with small scaling factor values (in orange color) will be pruned (left side). After pruning, we obtain compact models (right side), which are then fine-tuned to achieve comparable (or even higher) accuracy as normally trained full network”; [sec 1] “Pushing the values of BN scaling factors towards zero with L1 regularization enables us to identify insignificant channels (or neurons), as each scaling factor corresponds to a specific convolutional channel (or a neuron in a fully-connected layer).” [sec 3] “Multi-pass Scheme. We can also extend the proposed method from single-pass learning scheme (training with sparsity regularization, pruning, and fine-tuning) to a multi-pass scheme. Specifically, a network slimming procedure results in a narrow network, on which we could again apply the whole training procedure to learn an even more compact model. This is illustrated by the dotted-line in Figure 2. Experimental results show that this multi-pass scheme can lead to even better results in terms of compression rate” [sec 4, pp. 2740-2741] “From Figure 5, it can be concluded that the classification performance of the pruned or fine-tuned models degrade only when the pruning ratio surpasses a threshold”; e.g., “compact models” along with “pruning” may read on “identified region”. In addition, e.g., “network slimming procedure results in a narrow network, on which we could again apply the whole training procedure to learn an even more compact model” along with table 4 may read on “selectively controlling the instances to obtain a desired output” since each instance is used for different testing as shown in table 4. Furthermore, e.g., “classification” may read on “desired output”. Note that Lotter teaches “manipulation application”.)

Lotter and Liu are combinable with Liu for the same rationale as set forth above with respect to claim 1.

Claims 11 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Lotter et al. (DEEP PREDICTIVE CODING NETWORKS FOR VIDEO PREDICTION AND UNSUPERVISED LEARNING) in view of Liu et al. (Learning Efficient Convolutional Networks through Network Slimming) further in view of Li et al. (Concurrent Activity Recognition with Multimodal CNN-LSTM Structure)

Regarding claim 11
Lotter and Liu teaches claim 1. 

Lotter further teaches 
a node is implemented by an electronic circuit [with a forget gate deciding whether to keep or forget history information] and operating on comprises changing activation [of the forget gate].
(Lotter, [figs 1 and 4] “LSTM” [sec 1] “We also find that our architecture can scale effectively to natural image sequences, by training using car-mounted camera videos. The network is able to successfully learn to predict both the movement of the camera and the movement of objects in the camera’s view” [sec 1, p. 1] “Code and video examples can be found at: https://coxlab.github.io/prednet/” [sec 2] “For the representation neurons, we specifically use convolutional LSTM units (Hochreiter & Schmidhuber, 1997; Shi et al., 2015).” [sec 3.2, pp. 7-8] “A random hyperparameter search, with model selection based on the validation set, resulted in a 4 layer model with 3x3 convolutions and layer channel sizes of (3, 48, 96, 192). Models were again trained with Adam (Kingma & Ba, 2014) using a loss either solely computed on the lowest layer (L0) or with a weight of 1 on the lowest layer and 0.1 on the upper layers (Lall).”; e.g., “code” may read on “electronic circuit” since the code runs on a computer which has an electronic circuit.)

However, Lotter and Liu do not teach
a node is implemented by an electronic circuit [with a forget gate deciding whether to keep or forget history information] and operating on comprises changing activation [of the forget gate].

Li teaches
a node is implemented by an electronic circuit with a forget gate deciding whether to keep or forget history information and operating on comprises changing activation of the forget gate.
(Li, [figs 4 and 6-9] “The structure of a single LSTM neuron” and “Forget Gate” [sec 4.4] “The forget gate decides whether previous memories should be considered in the current time instance. For time instance t, we use xt to denote the neuron input, Ct to denote its memory, and ht to denote its output. The output of forget gate at time t is: 𝑓𝑡 = 𝜎(𝑊𝑓[𝑥𝑡 , ℎ𝑡−1 ] + 𝑏𝑓) where σ(∙) denotes the sigmoid activation function, Wf denotes the weights, and bf denotes the bias. The memory gate produces the current memory 𝐶𝑡 by generating a new candidate memory 𝐶𝑡 ̃ and combining it with the old memory passed from the forget gate 𝑓𝑡: 
    PNG
    media_image1.png
    121
    328
    media_image1.png
    Greyscale
 where it is the input gate activation, ft is the forget gate activation, and W and b terms are the weights and biases.” [sec 5] “We used Keras [34] with the TensorFlow as backend, and a GTX 1080 GPU with 8GB VRAM for training.”;)

Lotter, Liu and Li are all in the same field of endeavor of processing input signal with the neural network system and are analogous. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Lotter and Liu with the forget gate of Li. 
Doing so would lead to addressing the challenge of recognizing concurrent activities, which are common in real-world scenarios.
(Li, [sec 1] “In this paper, we propose network slimming, a simple yet effective network training scheme, which addresses all the aforementioned challenges when deploying large CNNs under limited resources.”).

Regarding claim 20
Lotter and Liu teaches claim 12. 

Claim 20 is a system claim corresponding to the method claim 11, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 11.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEHWAN KIM whose telephone number is (571)270-7409. The examiner can normally be reached Mon - Thu 7:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/S.K./Examiner, Art Unit 2129                                                                                                                                                                                                        
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129