Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This is a second non-final rejection, because a new ground of rejections made below was not necessitated by applicant’s amendment.  This Office Action is responsive to Applicants' Amendment filed on November 11, 2022, in which claims 1-3, 9-13, and 19-20 are currently amended. Claims 1-20 are currently pending.

Response to Arguments
Applicant’s arguments with respect to rejection of claims 1, 4-11, and 14-20 under 35 U.S.C. 101 based on amendment have been considered, however, have not been deemed persuasive.  The independent claims 1 and 11 for example are directed entirely towards mental processes and mathematical calculations which can be performed entirely in the mind or with the assistance of a tool such as pen and paper.  As mentioned in the Non-Final Office Action mailed 5/04/2022 the neural network elements are recited at a high level and do not integrate the judicial exception into a practical application.  Examiner further asserts that the mere recitation of a ‘computation speed’ does not necessarily invoke the need for a computer.  One of ordinary skill in the art could reasonably perform several convolution operations using different filter sizes on paper and then compare the time it took to perform said operations before determining which filter size to use.  For these reasons further detailed in the 101 analysis below, Examiner asserts that it is appropriate to maintain the rejection. 
Applicant’s arguments with respect to rejection of claims 1-20 under 35 U.S.C. 102/103 based on amendment have been considered and are persuasive. The argument is moot in view of a new ground of rejection set forth below.

Claim Objections
Claim 10 objected to because of the following informalities:  “on responsive to” should read “responsive to”.  Appropriate correction is required.

Claim Rejections - 35 USC § 101
101 Rejection
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 4-11, and 14-20 are rejected under 35 USC § 101 because the claimed invention is directed to non-statutory subject matter.

Regarding Claim 1:  Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 1 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes and mathematical calculations.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: 
determining a first computation speed of first filters having a first filter size in a layer of the CNN (mathematical calculation),
 determining a second computation speed of second filters having a second filter size in the layer of the CNN (mathematical calculation),
Responsive to the second computation speed being faster than the first computation speed: changing the size of at least one of the first filters to the second filter size (mathematical calculation/relationship)
“changing the size of at least one of the first filters to the second filter size” (mathematical calculation)
Therefore, claim 1 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 1 recites additional elements “convolutional neural network” and “filter”. However, these additional features are computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Therefore, claim 1 is directed to a judicial exception.
Step 2B Analysis:  Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 1 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to independent claim 11, which recites a processor and circuitry for performing the method, as well as to dependent claims 4-10, and 14-20. The additional limitations of the dependent claims are addressed briefly below:
Dependent claims 4 and 14 recite additional mathematical calculation/relationships “changing the at least one of the first filters to the second filter size comprises upscaling the at least one of the first filters to a larger filter size” 
Dependent claims 5 and 15 recite additional mathematical calculation/relationship: “the upscaling comprises padding the at least one of the first filters with zero weights.” 
Dependent claims 6 and 16 recite additional mathematical calculation/relationship “changing the at least one of the first filters to the second filter size comprises downscaling the at least one of the first filters to a smaller filter size.” 
Dependent claims 7 and 17 recite additional mathematical calculation/relationship “the downscaling comprises max pooling, wherein the max pooling comprises selecting the maximum value of each of a plurality of pools of filter weights of the at least one of the first filters to represent a single filter weight in the downscaled filter.” 
Dependent claims 8 and 18 recite mathematical calculations “determining a norm of each of the first filters” as well as observation, evaluation, and judgement “ranking the first filters by their norms”.
Dependent claims 9 and 19 recite additional mathematical calculation/relationship “responsive to the second computation speed being slower than the first computation speed, changing the size of at least one of the first filters to a third filter size.” 
Dependent claims 10 and 20 recite additional mathematical calculation/relationship “responsive to the second computation speed being equal to the first computation speed, changing the size of at least one of the first filters to the second filter size.” 

Therefore, when considering the elements separately and in combination, they do not do not add significantly more to the inventive concept. Accordingly, claims 1, 4-11, and 14-20 are rejected under 35 U.S.C. § 101. 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.


	Claims 1, 6, 9, 10-11, 16, 19, and 20 are rejected under U.S.C. §103 as being unpatentable over the combination of Xu (“GLOBALLY SOFT FILTER PRUNING FOR EFFICIENT CONVOLUTIONAL NEURAL NETWORKS”, 2018) and Yao (“HARDWARE-FRIENDLY CONVOLUTIONAL NEURAL NETWORK WITH EVEN-NUMBER FILTER SIZE”, 2016). 

	 Regarding claim 1, Xu teaches A method for increasing inference speed of a trained convolutional neural network (CNN), the method comprising:([p. 1 §1] " Structured pruning (Wen et al. (2016);Mao et al. (2017);Luo et al. (2017);Zhu & Gupta (2018);He et al. (2018)) takes into account both the amount of computation and storage required. It performs regular model clipping based on filter, channel, and even a layer. Since the model pruned has good regularity, it can significantly reduce the scales of calculation and storage at the same time, which has recently received extensive attention. Essentially, the work of this paper is based on the idea of filter pruning.")
	determining a first computation speed of first filters having a first filter size in a layer of the CNN;([p. 4 §3.1] "We denote filters of the entire network as [See Eqn]. Cl denotes the number of channels for the l-th convolution layer"  [p. 6 §4.1] "In Figure 3, we can clearly gauge the importance of each filter across the network. When the pruning rate is set to 90%, CONV1 can only retain 3 filters after global measurement, but as can be seen from the figure 3 , there are 5 filters that are obviously saliency, we need drop two important filters, so the classification accuracy drop about 0.32%. Therefore, visual saliency scores can help us find the most appropriate pruning rate" See also Table 3.  VGG-16 Orgin Conv2d-2 MFLOP's interpreted as synonymous with a first computation speed of first filters having a first filter size in a layer of the CNN.)
	determining a second computation speed of second filters having a second filter size in the layer of the CNN; and([p. 4 §3.1] "We denote filters of the entire network as [See Eqn]. Cl denotes the number of channels for the l-th convolution layer" [p. 5 §3.2] "Note that if we set the value of selected NlPl filters to zero, the channel corresponding to the next layer of filters will be set zero simultaneously" Setting channel in next layer to zero interpreted as synonymous with pruning channel which is interpreted as synonymous with changing a size of the filter to obtain a second filter size. [p/ 6 §4.1] "In Figure 3, we can clearly gauge the importance of each filter across the network. When the pruning rate is set to 90%, CONV1 can only retain 3 filters after global measurement, but as can be seen from the figure 3 , there are 5 filters that are obviously saliency, we need drop two important filters, so the classification accuracy drop about 0.32%. Therefore, visual saliency scores can help us find the most appropriate pruning rate" See also Table 3.  GSFP Conv2d-2 MFLOP's interpreted as synonymous with a second computation speed of second filters having a second filter size in a layer of the CNN.)
	responsive to the second computation speed being faster than the first computation speed:(Table 3 shows that the pruned Conv2d-2 which is entirely comprised of filters has a substantially faster computation speed than the original (8.72 MFLOPs vs. 37.75). [p. 1 §1] "Since the model pruned has good regularity, it can significantly reduce the scales of calculation and storage at the same time, which has recently received extensive attention. Essentially, the work of this paper is based on the idea of filter pruning." Xu shows that an explicit intent of pruning is to accelerate the network computations such that changing the filter size responsive to the second computation speed being faster than the first computation speed would lead to obvious and expected results.)
	changing the size of at least one of the first filters to the second filter size.(Figure 1 shows pruned channels in filters of a second filter size after pruning filters in the previous layer which is interpreted as synonymous with changing the size of at least one of the first filters to the second filter size.).
	While Xu explicitly teaches the claim limitations, Examiner asserts that there may be multiple rationales for changing the filter size in a convolutional neural network accelerator which anticipate the claims.  Although not necessarily relied upon, the secondary reference Yao is introduced to reinforce the obviousness of the claims. 

	Yao, in the same field of endeavor, teaches A method for increasing inference speed of a trained convolutional neural network (CNN), the method comprising:([Abstract] "In this paper, we analyze the influences of filter size on CNN accelerator performance and show that even-number filter size is much more hardware-friendly that can ensure high bandwidth and resource utilization")
	determining a first computation speed of first filters having a first filter size in a layer of the CNN;([p. 2 §1] "Figure 1: Influences of filter size on hardware design: Adder tree structure with (a) 3×3 filter and (b) 2×2 filter; Memory access pattern with (c) 3×3 filter and (d) 2×2 filter" See also Figure 2 which shows computational complexity (FLOPs) of various filter sizes.  5x5 filter in 2a or 3x3 filter in 2b interpreted as synonymous with a first filter size in a layer of the CNN.)
	determining a second computation speed of second filters having a second filter size in the layer of the CNN; and([p. 2 §1] "Figure 1: Influences of filter size on hardware design: Adder tree structure with (a) 3×3 filter and (b) 2×2 filter; Memory access pattern with (c) 3×3 filter and (d) 2×2 filter" See also Figure 2 which shows computational complexity (FLOPs) of various filter sizes.  2x2 filter interpreted as synonymous with a second filter size in a layer of the CNN.)
	responsive to the second computation speed being faster than the first computation speed: changing the size of at least one of the first filters to the second filter size.([p. 3 §3.2] "After replacing the 3×3 Conv filters with 2×2 ones, the size of feature maps in the network changes. We remove the padding in the later Conv layer in each pair of Conv layers to ensure the input feature map of each MP layer remains the same. As the middle columns in Figure 2 (b) show, the validation error rises to 8.67%, but the total computations is reduce to 49% of the original network").

	Xu as well as Yao are directed towards accelerating convolutional neural networks by manipulating filter size.  Therefore, Xu as well as Yao are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Xu with the teachings of Yao by reshaping the filter kernels to even valued number dimensions.  Yao provides as additional motivation for combination ([p. 3 §4] " on mnist on cifar-10 it reduced the computation by 1.4× to 2× with less than 0.1% loss of accuracy. On the other hand, shrinking the kernel from 3x3 to 2x2 at the same time of increasing the number of channels, such that the total number of computation remains the same, will result in better prediction accuracy. This will facilitate building hardware inference engine with higher efficiency.").  This motivation for combination also applies to the remaining claims which depend on this combination.

	 Regarding claim 6, the combination of Xu and Yao teaches The method of claim 1, wherein changing the at least one of the first filters to the second filter size comprises downscaling the at least one of the first filters to a smaller filter size.(Xu Figure 1 shows pruned channels in filters of a second filter size after pruning filters in the previous layer which is interpreted as synonymous with downscaling at least one of the first filters to a smaller filter size.).
	
	 Regarding claim 9, the combination of Xu and Yao teaches The method of claim 1, further comprising, responsive to the second computation speed being slower than the first computation speed, changing the size of at least one of the first filters to a third filter size.(Yao [Figure 2] "Figure 2: Test error and normalized computational complexities (FLOPs) of (a) LeNet5 on MNIST and (b) VGG11-Nagadomi on CIFAR-10. With comparable accuracy, even kernel can reduce the FLOP by 50% on cifar dataset and 30% on mnist dataset; with comparable FLOPs, even kernel can have higher accuracy than odd size kernel." [p. 3 §3.1] "As shown in the figure, replacing the 5×5 Conv filters in LeNet with 4×4 or other even-number ones does not introduce high error rate. Since smaller Conv filter demands few multiplications in one Conv operation, generally, the total number of operations can be reduce by using smaller evennumber Conv filters" Figure 2 shows that the second filter size (4x4) actually performs slower (has greater computational complexity) than the (2x2) so Yao explicitly teaches that the number of operations can be reduced by using a smaller even number Conv filter such as the 2x2 filter seen in Figure 1 which is interpreted as synonymous with a third filter size.).
	
	Regarding claim 10, the combination of Xu and Yao teaches The method of claim 1, further comprising, on responsive to the second computation speed being equal to the first computation speed, changing the size of at least one of the first filters to the second filter size.(Yao [Abstract] "With same FLOPs, even kernel can have even higher accuracy than odd size kernel." Yao teaches that even when the computation speed of a second filter size may be equal to that of the first the accuracy may be improved by changing the filter size to a second filter size.   Therefore, changing the filter size responsive to the second computation speed being equal to the first would lead to obvious and expected outcomes.).

	Regarding claims 11, 16, 19, and 20, claims 11, 16, 19, and 20 are directed towards a processor for implementing the method of claims 1, 6, 9, and 10, respectively.  Therefore, the rejections applied to claims 1, 6, 9, and 10 also apply to claims 11, 16, 19, and 20.  Claims 11, 16, 19, and 20 also recite additional elements processor and circuitry (Yao [p. 2 §3] “The experiment platform consists of an Intel Xeon E5-2690 CPUs@2.90GHz and the 2 NVIDIA TITAN X GPUs”).
	
	Claims 2-3 and 12-13 are rejected under U.S.C. §103 as being unpatentable over the combination of Xu and Yao and Koivisto (US20190251442A1).

	 Regarding claim 2, the combination of Xu and Yao teaches The method of claim 1, further comprising: retraining the CNN, after changing the size of at least one of the first filters to the second filter size, to generate a retrained CNN;(Xu Figure 1 shows that the CNN is retrained in each epoch after changing the filter size. [p. 4 §4] "For retraining of filter pruning, we use a constant learning rate 0:01 and retrained 100 epochs on MNIST. In CIFAR-10, we set the initial learning rate to 0:01, multiply by 0:1 per 50 epoch, and retrained 150 epochs. Finally, we retrained 100 epochs on ImageNet" With respect to the instant specification, training the intermediate pruned network interpreted as synonymous with retraining the network after pruning.).
	However, the combination of Xu and Yao doesn't explicitly teach determining a key performance indicator (KPI) loss of the retrained CNN; and
	changing the size of a fewer number of the first filters to the second filter size responsive to the KPI loss exceeding a threshold..

	Koivisto, in the same field of endeavor, teaches determining a key performance indicator (KPI) loss of the retrained CNN; and([¶0028] "In one embodiment, the training engine 160 may implement any number of regularization techniques to reduce the average magnitude of one or more of the filters during training. For instance, in some embodiments, the training engine 160 modifies a typical loss term LD(x,y,W) by an additional regularization loss term R(W) to generate an overall loss term L(x,y,W) using the following equations (1) and (2):")
	changing the size of a fewer number of the first filters to the second filter size responsive to the KPI loss exceeding a threshold.([0030] In one embodiment, the, the filter pruning engine 170 identifies one or more filters included in the intermediate neural network 162 having average magnitudes lower than a pruning threshold 164. The filter pruning engine 170 may compute an average magnitude for each of the filters in any technically feasible fashion. The average magnitude computed by the filter pruning engine 170 may or may not be consistent with the regularization loss term R(W) implemented in the training engine.  [¶0031] "If the average magnitude is lower than the pruning threshold 164, then the filter pruning engine 170 adds the filter to a pruning list (not shown). Otherwise, the filter pruning engine 170 omits the filter from the pruning list." Pruning only if below threshold is synonymous with the number of filters changed being zero which is guaranteed to be less than the number of filters.).

	The combination of Xu and Yao as well as Koivisto are directed towards accelerating convolutional neural networks by manipulating filter size.  Therefore, the combination of Xu and Yao as well as Koivisto are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Xu and Yao with the teachings of Koivisto by using the loss term to determine whether or not a filter is pruned. Koivisto teaches as motivation for combination ([¶0026] “Advantageously, as persons skilled in the art will recognize, increasing the level of regularization typically increases the aggressiveness with which the filter pruning engine 170 removes filters from the intermediate neural network 162. Consequently, by varying the regularization parameter for each convolutional layer, the complexity analysis engine 150 indirectly configures the filter pruning engine 170 to prune convolutional layers having higher computational complexities more aggressively than convolutional layers having lower computational complexities. As a result, performing training with layer-specific regularization parameters instead of a single regularization parameter can more effectively reduce the overall inference time associated with the trained neural network 190”).  This motivation for combination also applies to the remaining claims which depend on this combination.

	 Regarding claim 3, the combination of Xu, Yao, Xu, Yao, and Koivisto teaches The method of claim 2, further comprising: changing the size of a greater number of the first filters to the second filter size responsive to the KPI loss not exceeding the threshold.(Koivisto [¶0031] "If the average magnitude is lower than the pruning threshold 164, then the filter pruning engine 170 adds the filter to a pruning list (not shown). Otherwise, the filter pruning engine 170 omits the filter from the pruning list.").

Claims 12-13 are directed towards a processor for implementing the method of claims 2-3.  Therefore, the rejections applied to claims 2-3 also apply to claims 12-13.  Claims 12-13 also recite additional elements processor and circuitry (Yao [p. 2 §3] “The experiment platform consists of an Intel Xeon E5-2690 CPUs@2.90GHz and the 2 NVIDIA TITAN X GPUs”).

	Claims 4 and 14 are rejected under U.S.C. §103 as being unpatentable over the combination of Xu and Yao and Han (“Optimizing Filter Size in Convolutional Neural Networks for Facial Action Unit Recognition”, 2018).

	 Regarding claim 4, the combination of Xu and Yao teaches The method of claim 1.
	However, the combination of Xu and Yao doesn't explicitly teach, wherein changing the at least one of the first filters to the second filter size comprises upscaling the at least one of the first filters to a larger filter size.

	Han, in the same field of endeavor, teaches The method of claim 1, wherein changing the at least one of the first filters to the second filter size comprises upscaling the at least one of the first filters to a larger filter size.([p. 5074 §3.3.2] "An illustration of the shrink and expand operations to change the filter size. The shrink operation sets zeros to the outside boundary; while the expand operation is to pad the outside boundary with the nearest neighbors from the original filter.").

	The combination of Xu and Yao as well as Han are directed towards accelerating convolutional neural networks by manipulating filter size.  Therefore, the combination of Xu and Yao as well as Han are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Xu and Yao with the teachings of Han by upscaling the filter to a larger size.  Han teaches as motivation for combination ([p. 5071 §1] “Experimental results on two benchmarkAU-coded spontaneous databases, i.e., FERA2015 BP4D database [26] and Denver Intensity of Spontaneous Facial Action (DISFA) database [20] have demonstrated that the proposed OFSCNN outperforms the traditional CNNs with the best filter size obtained by exhaustive search and achieves state-of the-art performance for AU recognition. Furthermore, the OFS-CNN also beats a deep CNN using multiple filter sizes with a remarkable improvement in time efficiency during testing, which is highly desirable for realtime applications.  In addition, the OFS-CNN is  capable of estimating optimal filter size for varying image resolution.”).

Claim 14 is directed towards a processor for implementing the method of claim 4.  Therefore, the rejections applied to claim 4 also applies to claim 14.  Claim 14 also recites additional elements processor and circuitry (Yao [p. 2 §3] “The experiment platform consists of an Intel Xeon E5-2690 CPUs@2.90GHz and the 2 NVIDIA TITAN X GPUs”).

	Claims 5 and 15 are rejected under U.S.C. §103 as being unpatentable over the combination of Xu and Yao and Han and Lin (“Data and Hardware Efficient Design for Convolutional Neural Network”, 2018). 

	 Regarding claim 5, the combination of Xu, Yao, and Han teaches The method of claim 4.
	However, the combination of Xu, Yao, and Han doesn't explicitly teach the upscaling comprises padding the at least one of the first filters with zero weights..

	Lin, in the same field of endeavor, teaches the upscaling comprises padding the at least one of the first filters with zero weights.([p. 1644 §IVA] "Thus, we use zero padding for the kernels and have a new formula for mapped kernel size").

	The combination of Xu, Yao, and Han as well as Lin are directed towards accelerating convolutional neural networks by manipulating filter sizes.  Therefore, the combination of Xu, Yao, and Han as well as Lin are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Xu, Yao, Xu, and Han with the teachings of Lin by zero padding expanded filter sizes.  Han explicitly teaches padding the larger filters but teaches using the nearest neighbor rather than zero padding.  While zero padding is well-known in the art and would be obvious to one of ordinary skill in the art, Lin explicitly teaches zero padding filters after expanding.  Lin provides as additional motivation for combination ([p. 1650 §6] "Our design has much higher throughput and lower area cost than that of Eyeriss [33]. Thus the area efficiency of our design is higher even after technology scaling. This is because our connection is more regular that makes lower hardware cost").

Claim 15 is directed towards a processor for implementing the method of claim 5.  Therefore, the rejections applied to claim 5 also applies to claim 15.  Claim 15 also recites additional elements processor and circuitry (Yao [p. 2 §3] “The experiment platform consists of an Intel Xeon E5-2690 CPUs@2.90GHz and the 2 NVIDIA TITAN X GPUs”).

	Claims 7 and 17 are rejected under U.S.C. §103 as being unpatentable over the combination of Xu and Yao and Lin.

	 Regarding claim 7, the combination of Xu and Yao teaches The method of claim 6.
	However, the combination of Xu and Yao doesn't explicitly teach the downscaling comprises max pooling, wherein the max pooling comprises selecting the maximum value of each of a plurality of pools of filter weights of the at least one of the first filters to represent a single filter weight in the downscaled filter..

	Lin, in the same field of endeavor, teaches The method of claim 6, wherein the downscaling comprises max pooling, wherein the max pooling comprises selecting the maximum value of each of a plurality of pools of filter weights of the at least one of the first filters to represent a single filter weight in the downscaled filter.([p. 1643 §II] "The pooling layer executes the down-sampling operation that not only reduces the feature map size but also improves the translational invariance of features. The typical pooling types are maximum pooling and average pooling that calculate the maximum value and the average value from the corresponding kernels respectively. Fig. 6 shows an example of maximum pooling that has the kernel size 2×2 and the stride size two").

	The combination of Xu and Yao as well as Lin are directed towards accelerating neural networks by manipulating filter size.  Therefore, the combination of Xu and Yao as well as Lin are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Xu and Yao with the teachings of Lin by zero padding expanded filter sizes.  Han explicitly teaches padding the larger filters but teaches using the nearest neighbor rather than zero padding.  While zero padding is well-known in the art and would be obvious to one of ordinary skill in the art, Lin explicitly teaches zero padding filters after expanding.  Lin provides as additional motivation for combination ([p. 1650 §6] "Our design has much higher throughput and lower area cost than that of Eyeriss [33]. Thus the area efficiency of our design is higher even after technology scaling. This is because our connection is more regular that makes lower hardware cost").

Claim 17 is directed towards a processor for implementing the method of claim 7.  Therefore, the rejections applied to claim 7 also applies to claim 17.  Claim 17 also recites additional elements processor and circuitry (Yao [p. 2 §3] “The experiment platform consists of an Intel Xeon E5-2690 CPUs@2.90GHz and the 2 NVIDIA TITAN X GPUs”).

	Claims 8 and 18 are rejected under U.S.C. §103 as being unpatentable over the combination of Xu and Yao and Li (“PRUNING FILTERS FOR EFFICIENT CONVNETS”, 2017).

	 Regarding claim 8, the combination of Xu and Yao teaches The method of claim 1, further comprising: determining a norm of each of the first filters, (Xu [p. 4 §4] "We all know that filter sizes are often different in different convolution layers. However, the size of the filter is not considered in the formula which is one of the reasons for the uneven pruning results. So we have modified the l2-norm formula as follows: [See Eqn. 3]").
	However, the combination of Xu and Yao doesn't explicitly teach and ranking the first filters by their norms;
	wherein a lowest normed filter of the first filters is scaled; and wherein a highest normed filter of the first filters is not scaled..

	Li, in the same field of endeavor, teaches and ranking the first filters by their norms;([p. 7 §4.1] "Figure 5: Visualization of filters in the first convolutional layer of VGG-16 trained on CIFAR-10. Filters are ranked by `1-norm")
	wherein a lowest normed filter of the first filters is scaled; and wherein a highest normed filter of the first filters is not scaled.([p. 4 §3.1] "Recent work (Zhou et al. (2016); Wen et al. (2016)) apply group-sparse regularization (Pnij=1 kFi,jk2 or `2,1-norm) on convolutional filters, which also favor to zero-out filters with small l2-norm" [p. 9 §4.4] "pruning the smallest filters outperforms pruning random filters for most of the layers at different pruning ratios. For example, smallest filter pruning has better accuracy than random filter pruning for all layers with the pruning ratio of 90%" Zeroing out interpreted as synonymous with scaling.  Li explicitly teaches that the filters with the lowest norm are zeroed out or pruned and that the largest most important filters remain unchanged.).

	The combination of Xu and Yao as well as Li are directed towards accelerating convolutional neural networks by manipulating filter size.  Therefore, the combination of Xu and Yao as well as Li are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Xu and Yao with the teachings of Li by scaling filters based on the result of a ranked normalization. Li teaches as a motivation for combination ([p. 3 §2] “Similar to the above work, we use `1-norm to select unimportant filters and physically prune them. Our fine-tuning process is the same as the conventional training procedure, without introducing additional regularization. Our approach does not introduce extra layer-wise meta-parameters for the regularizer except for the percentage of filters to be pruned, which is directly related to the desired speedup. By employing stage-wise pruning, we can set a single pruning rate for all layers in one stage”).  

Claim 18 is directed towards a processor for implementing the method of claim 8.  Therefore, the rejections applied to claim 8 also applies to claim 18.  Claim 18 also recites additional elements processor and circuitry (Yao [p. 2 §3] “The experiment platform consists of an Intel Xeon E5-2690 CPUs@2.90GHz and the 2 NVIDIA TITAN X GPUs”).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Lin (“Accelerating Convolutional Networks via Global & Dynamic Filter Pruning”, 2018) is considered relevant as it is directed towards dynamically resizing filters in a CNN to improve performance. Wang (“Real-time meets approximate computing: An elastic CNN inference accelerator with adaptive trade-off between QoS and QoR”, 2017) is also considered relevant as it is directed towards a neural network accelerator which takes advantage of filter pruning.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720.  The examiner can normally be reached on M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        



/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124