Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
101 Rejection
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 USC § 101 because the claimed invention is directed to non-statutory subject matter.

Regarding Claim 1:  Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 1 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes and mathematical calculations.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: 
calculating divergence for a set of layers of the neural network model, the set of layers comprising at least one batch norm layer (mathematical calculation),
 analyzing, based on the calculated divergence, a stability of each of the set of layers (observation, evaluation, and judgement)
removing, based on the analysis determining a subset of the set of layers fails to meet a threshold stability, the subset of the set of layers of the neural network model (mathematical relationships and calculations)
Therefore, claim 1 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Therefore, claim 1 is directed to a judicial exception.
Step 2B Analysis:  Claim 1 recites additional elements “training a neural network model with a first set of training data” which is well-understood, routine, and conventional (Wei, “The Improvements of BP Neural Network Learning Algorithm”, 2000, [Abstract] "In this paper a new method in BP algorithm to avoid local minimum was proposed by means of adding gradually training data and hidden units" , [p. 1649 §5] "The back-propagation algorithm(BP) is a well-known method of training a multilayer Feed-Forward Artificial Neural Networks").  As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 1 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to independent claims 7 and 15, which recite a system and a computer program product, respectively, as well as to dependent claims 2-6, 8-14, and 16-20.  Independent claims 7 and 15 also recite generic computer components “computer usable program product”, “processor”, “computer readable memories”, “instructions”, and “storage devices”.  The additional limitations of the dependent claims are addressed briefly below:
Dependent claims 2, 10, and 16 recite additional mathematical calculations “calculating a cosine distance between weight vectors of a layer at separate iterations.”
Dependent claims 3, 11, and 17 recites additional mathematical calculations “re-training the neural network model with the first set of training data” which is well-understood, routine, and conventional.  See Setiono (“Neural-Network Feature Selector”, 1997)  ([p. 654 §1] "The network is trained with the complete set of attributes as input. For each attribute in the network, we compute the accuracy of the network with all the weights of the connections associated with this attribute set to zero. The attribute that gives the smallest decrease in the network accuracy is removed. The network is then retrained and the process is repeated." [p. 656 §III] ", we use a very simple criterion to determine which attribute is to be excluded from the network. This criterion is the network accuracy on the training dataset")
Dependent claims 4, 12, and 18 recite additional limitations “further comprising re-training the neural network model with a different set of training data.” which is well-understood, routine, and conventional.  See Dai (“Automatic picking of seismic arrivals in local earthquake data using an artificial neural network”, 1995) ([p. 770 §5] "The method is adaptive, and training sets can be altered to enhance particular features of different data sets. Adding new training data sets and retraining the network is easy and quick, and can improve the performance of the network.")
Dependent claims 5, 13, and 19 recite additional mathematical relationships and calculations “removing, based on the calculated divergence determining a second subset of the set of layers fails to meet a threshold divergence, the second subset of the set of layers of the neural network model.”
Dependent claims 6, 14, and 20 recite additional mathematical relationships and calculations “the divergence of a layer of the set of layers is proportional to a depth of the layer.”
Therefore, when considering the elements separately and in combination, they do not do not add significantly more to the inventive concept. Accordingly, claims 1-20 are rejected under 35 U.S.C. § 101. 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.


	Claims 1-3, 5-11, 13-17, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Lee (WO2019107900A1) and in view of Wang (“An multi-scale learning network with depthwise separable convolutions”, 2018). 

	Regarding claim 1, Lee teaches A method comprising: training a neural network model with a first set of training data; ([¶0046] "Fig. 1 shows the similarity distribution between the filters of each convolutional layer (between filter pairs) in the modified version of the Vgg-16 network trained on the CIFAR-10 dataset")
	calculating divergence for a set of layers of the neural network model, the set of layers comprising at least one batch norm layer; ([¶0070] "The types of similarity measures may include a cosine similarity type and a Pearson correlation coefficient type" With respect to the instant specification a layer is simply a collection of nodes in an artificial neural network.  Therefore, a convolutional neural network filter is interpreted as a layer in a convolutional neural network.)
	analyzing, based on the calculated divergence, a stability of each of the set of layers; and ([¶0084] "the pruning unit 12 considers the similarity calculated by the calculation unit 11 (in other words, considers the similarity matrix), and considers the threshold similarity among the filters included in any one convolutional layer (Convolutional layer 1). For at least one similar filter pair having a similarity greater than , any one of two filters included in each similar filter pair may be selectively pruned" Considering the similarity interpreted as synonymous with analyzing the stability of each of the set of layers.  Filters in convolutional layer 1 interpreted as synonymous with the set of layers.).
	removing, based on the analysis determining a subset of the set of layers fails to meet a threshold stability, the subset of the set of layers of the neural network model. ([¶0084] "the pruning unit 12 considers the similarity calculated by the calculation unit 11 (in other words, considers the similarity matrix), and considers the threshold similarity among the filters included in any one convolutional layer (Convolutional layer 1). For at least one similar filter pair having a similarity greater than , any one of two filters included in each similar filter pair may be selectively pruned" threshold similarity interpreted as synonymous with threshold stability.  Pruning interpreted as synonymous with removing. Pruned filter interpreted as synonymous with subset of the set of layers.).
	With respect to the published instant specification a neural network layer is simply a way of organizing nodes.  Therefore, one of ordinary skill in the art would readily interpret a convolutional filter as a layer.  In the case of Mobilenets a convolutional layer may comprise only a single separable filter, such that pruning said filter would be synonymous with removing a layer.  While not relied upon to teach the claim limitations, the disclosure of Wang further reinforces the obviousness.  

Wang, in the same field of endeavor, teaches that the convolutional layers may be separated into single filter convolutions ([p. 1 §2] "Depthwise separable convolutions divide standard convolution into a depthwise convolution and a 1 × 1 pointwise convolution [18]. The depthwise convolution applies a single filter to each input channel, given the feature map is expressed by DF × DF × M. Depthwise convolution with one filter per input is as follows (Eq. 1))" It would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that MobileNets typically rely on separable convolutions such that it would be obvious to have a single filter representative of an entire layer, such that pruning a filter as taught in Lee would be equivalent to removing a layer.). 

Lee and Wang are both directed towards accelerating Mobilenets.  Therefore, Lee and Wang are analogous art in the same field of endeavor.  It would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Lee with the teachings of Wang by ensuring that the neural network of Lee is one with separable convolutions with a single filter per input channel.  Wang provides as an additional motivation for combination ([Abstract] “The proposed method has two advantages: (1) It uses the multi-scale block with depthwise separable convolutions, which forms multiple sub-networks by increasing the width of the network while keeping the computational resources constant. (2) It combines the multi-scale block with residual connections and that accelerates the training of networks significantly.”).  This motivation for combination also applies to the remaining claims dependent on this combination. 

	Regarding claim 2, the combination of Lee, and Wang teaches The method of claim 1, calculating divergence further comprising: calculating a cosine distance between weight vectors of a layer at separate iterations. (Lee [¶0070] "The types of similarity measures may include a cosine similarity type and a Pearson correlation coefficient type" [¶0103] "Thereafter, the pruning unit 12 is configured to prevent a filter pair having a similarity exceeding a threshold similarity among the similarities between filters included in any one convolutional layer, so that a similar filter pair having a similarity exceeding the threshold similarity exists. Algorithm 1 can be iteratively performed only in this case." cosine similarity interpreted as synonymous with cosine distance.). 

	Regarding claim 3, the combination of Lee, and Wang teaches The method of claim 1, further comprising re-training the neural network model with the first set of training data. (Lee [¶0128] "FIG. 5 is a diagram showing the classification accuracy of the CIFAR-10 dataset that has been retrained after removing a 10% filter from each convolutional layer of Vgg-16"). 

	Regarding claim 5, the combination of Lee, and Wang teaches The method of claim 1, further comprising removing, based on the calculated divergence determining a second subset of the set of layers fails to meet a threshold divergence, the second subset of the set of layers of the neural network model. (Lee [¶0084] "the pruning unit 12 considers the similarity calculated by the calculation unit 11 (in other words, considers the similarity matrix), and considers the threshold similarity among the filters included in any one convolutional layer (Convolutional layer 1). For at least one similar filter pair having a similarity greater than , any one of two filters included in each similar filter pair may be selectively pruned" threshold similarity interpreted as synonymous with threshold stability.  Pruning interpreted as synonymous with removing. Pruned filter interpreted as synonymous with subset of the set of layers.  Lee explicitly teaches that either the first or second filter may be pruned such that the first and second filters are interpreted as synonymous with the first and second subset, respectively.). 

	Regarding claim 6, the combination of Lee, and Wang teaches The method of claim 1, wherein the divergence of a layer of the set of layers is proportional to a depth of the layer. (Lee [¶0101] "is a filter having a relatively small norm size (the size of a filter vector) among the two filters" [¶0157] "For filters having similarity, this is because low similar filters having a size smaller than a preset norm size are removed" Filter size interpreted as synonymous with layer depth.  Low similarity interpreted as synonymous with divergence.). 

	Regarding claim 7, claim 7 is directed towards a computer program product for performing the methods of claim 1.  Therefore, the rejection applied to claim 1 also applies to claim 7.  Claim 7 also recites “A computer usable program product comprising one or more computer-readable storage devices, and program instructions stored on at least one of the one or more storage devices” which is taught by Lee ([¶0185] "The filter pruning method in the convolutional neural network according to an embodiment of the present application may be implemented in the form of a program instruction that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination.").  

	Regarding claim 8, the combination of Lee, and Wang teaches The computer usable program product of claim 7, wherein the computer usable code is stored in a computer readable storage device in a data processing system, (Lee [¶0185] "The filter pruning method in the convolutional neural network according to an embodiment of the present application may be implemented in the form of a program instruction that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination.")
	and wherein the computer usable code is transferred over a network from a remote data processing system. (Lee [¶0186] "the present application may be implemented in the form of an application stored in a recording medium included in a server owned by a mobile device such as a smartphone or tablet, or an app store that provides an application to such a mobile device" App store interpreted as synonymous with remote data processing system.  App store providing application to a mobile device interpreted as synonymous with transferring over a network from a remote data processing system.). 

	Regarding claim 9, the combination of Lee, and Wang teaches The computer usable program product of claim 7, wherein the computer usable code is stored in a computer readable storage device in a server data processing system, (Lee [¶0186] "the present application may be implemented in the form of an application stored in a recording medium included in a server owned by a mobile device such as a smartphone or tablet, or an app store that provides an application to such a mobile device")
	and wherein the computer usable code is downloaded over a network to a remote data processing system for use in a computer readable storage device associated with the remote data processing system. (Lee [¶0186] "the present application may be implemented in the form of an application stored in a recording medium included in a server owned by a mobile device such as a smartphone or tablet, or an app store that provides an application to such a mobile device" mobile device interpreted as synonymous with remote data processing system.  App store providing application to a mobile device interpreted as synonymous with downloading over a network to a remote data processing system.). 

	Regarding claims 10-11 and 13-14, claims 10-11 and 13-14 are directed towards a computer program product for performing the methods of claims 2-3 and 5-6, respectively.  Therefore, the rejection applied to claims 2-3 and 5-6 also apply to claims 10-11 and 13-14.  Claims 10-11 and 13-14 also recite “A computer usable program product comprising one or more computer-readable storage devices, and program instructions stored on at least one of the one or more storage devices” which is taught by Lee ([¶0185] "The filter pruning method in the convolutional neural network according to an embodiment of the present application may be implemented in the form of a program instruction that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination.").  

	Regarding claims 15-17 and 19-20, claims 15-17 and 19-20 are substantially similar to claims 7, 10-11, and 13-14, respectively.  Therefore, the rejections applied to claims 7, 10-11, and 13-14 also apply to claims 15-17 and 19-20.  

	Claims 4, 12, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Lee, and Wang and in further view of Rajagopalan (US10834128B1).

	Regarding claim 4, Lee teaches The method of claim 1.
	However, Lee does not explicitly teach re-training the neural network model with a different set of training data.  

Rajagopalan, in the same field of endeavor, teaches re-training the neural network model with a different set of training data. ([Col. 11 l. 10-20] "pre-trained CNN 400A can be re-trained at each periodic interval of time and learn to identify webpages, URLS, logos, logotypes, logomarks or other suitable type of web elements that are associated with different cyber-attacks. Likewise, one or more CNNs such as 400B and/or pre-trained CNN 400A can be re-trained at each periodic interval of time and learn to identify webpages, URLS, logos, logotypes, logomarks or other suitable type of web elements that are associated with different trusted-entities."). 

	The combination of Lee and Wang and Rajagopalan are both directed towards pruning entire layers of convolution neural networks and retraining.  Therefore, the combination of Lee and Wang and Rajagopalan are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Lee and Wang with the teachings of Rajagopalan by retraining the network on a second training set.  Rajagopalan teaches as a motivation for combination that retraining on a different data set may improve network prediction ([Col. 11 l. 10-20] "the automated collection of datasets from multiple and diverse sources enables phishing detector 101 to improve the detection of zero-days attacks because the CNN 400B described with reference to FIG. 4B can be recurrently or iteratively retrained with different or subsequent training datasets having new web elements not included in training datasets used in previous training phases.")

	Regarding claim 12, claim 12 is directed towards a computer program product for performing the method of claim 4.  Therefore, the rejection applied to claim 4 also applies to claim 12.  Claim 12 also recites “A computer usable program product comprising one or more computer-readable storage devices, and program instructions stored on at least one of the one or more storage devices” which is taught by Lee ([¶0185] "The filter pruning method in the convolutional neural network according to an embodiment of the present application may be implemented in the form of a program instruction that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination.").  

	Regarding claim 18, claim 18 is substantially similar to claim 12.  Therefore, the rejection applied to claim 12 also applies to claim 18.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Zomoya (“An algorithm for the automatic generation of neural network structures”, 1993) is directed towards automatically generating neural network structures including deleting layers.  Li (“PRUNING FILTERS FOR EFFICIENT CONVNETS”, 2017) is directed towards similarity based filter pruning of convolutional neural networks.  
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720.  The examiner can normally be reached on M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        



/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124