Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
101 Rejection
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 4-11, and 14-20 are rejected under 35 USC § 101 because the claimed invention is directed to non-statutory subject matter.

Regarding Claim 1:  Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 1 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes and mathematical calculations.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: 
determining a first computation speed of first filters having a first filter size in a layer of the CNN (mathematical calculation),
 determining a second computation speed of second filters having a second filter size in the layer of the CNN (mathematical calculation),
on a condition that the second computation speed is faster than the first computation speed: changing the size of at least one of the first filters to the second filter size (mathematical calculation/relationship)
“changing the size of at least one of the first filters to the second filter size” (mathematical calculation)
Therefore, claim 1 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 1 recites additional elements “convolutional neural network” and “filter”. However, these additional features are computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Therefore, claim 1 is directed to a judicial exception.
Step 2B Analysis:  Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 1 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to independent claim 11, which recites a processor and circuitry for performing the method, as well as to dependent claims 4-10, and 14-20. The additional limitations of the dependent claims are addressed briefly below:
Dependent claims 4 and 14 recite additional mathematical calculation/relationships “changing the at least one of the first filters to the second filter size comprises upscaling the at least one of the first filters to a larger filter size” 
Dependent claims 5 and 15 recite additional mathematical calculation/relationship: “the upscaling comprises padding the at least one of the first filters with zero weights.” 
Dependent claims 6 and 16 recite additional mathematical calculation/relationship “changing the at least one of the first filters to the second filter size comprises downscaling the at least one of the first filters to a smaller filter size.” 
Dependent claims 7 and 17 recite additional mathematical calculation/relationship “the downscaling comprises max pooling, wherein the max pooling comprises selecting the maximum value of each of a plurality of pools of filter weights of the at least one of the first filters to represent a single filter weight in the downscaled filter.” 
Dependent claims 8 and 18 recite mathematical calculations “determining a norm of each of the first filters” as well as observation, evaluation, and judgement “ranking the first filters by their norms”.
Dependent claims 9 and 19 recite additional mathematical calculation/relationship “on a condition that the second computation speed is slower than the first computation speed, changing the size of at least one of the first filters to a third filter size.” 
Dependent claims 10 and 20 recite additional mathematical calculation/relationship “on a condition that the second computation speed is equal to the first computation speed, changing the size of at least one of the first filters to the second filter size.” 

Therefore, when considering the elements separately and in combination, they do not do not add significantly more to the inventive concept. Accordingly, claims 1, 4-11, and 14-20 are rejected under 35 U.S.C. § 101. 


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 6-7, 9, 11, 16-17, and 19 are rejected under 35 U.S.C. 102 as being anticipated by Lin (“Data and Hardware Efficient Design for Convolutional Neural Network”, 2018).

	Regarding claim 1, Lin teaches A method for increasing inference speed of a trained convolutional neural network (CNN), the method comprising: ([p. 1646 §V] "For inference of a CNN, this accelerator is operated layer by layer.")
	determining a first computation speed of first filters having a first filter size in a layer of the CNN; ([p. 1644 §IVA] "For a 7x7 kernel with stride 2 as used in GoogleNet and ResNet" Filter interpreted as a set of kernels such that changing kernel size is interpreted as synonymous with changing filter size.  Stride 2 interpreted as synonymous with a first computation speed of the first filters having a first filter size.)
	determining a second computation speed of second filters having a second filter size in the layer of the CNN; and ([p. 1644 §IVA] "the new mapped kernel size will be (3x3, 3x4, 4x3, 4x4). This will result in a fully usable output. However, above nonsquare kernel size is bad for implementation. Thus, we use zero padding for the kernels and have a new formula for mapped kernel size as (K/S)×(K/S).  Thus, for a 7×7 kernel with stride 2, we will have four (4×4) kernels. Note this will result in slightly lower hardware utilization (roughly around 70%), but is still better than hardware utilization of the original stride one (roughly 1/S^2, usually <= 1/4)" New computation speed taught as 1/S^2)
	on a condition that the second computation speed is faster than the first computation speed: ([p. 1644 §IVA] "this will result in slightly lower hardware utilization (roughly around 70%), but is still better than hardware utilization of the original stride" Better hardware utilization interpreted as synonymous with faster than the first computation speed.)
	changing the size of at least one of the first filters to the second filter size. ([p. 1644 §IVA] "This will result in a fully usable output. However, above nonsquare kernel size is bad for implementation. Thus, we use zero padding for the kernels and have a new formula for mapped kernel size". Lin explicitly shows that they calculate the strides of two different filter sizes and change the size of the filter to the more optimized mapping.). 

	Regarding claim 6, Lin teaches The method of claim 1, wherein changing the at least one of the first filters to the second filter size comprises downscaling the at least one of the first filters to a smaller filter size. (See FIG. 8 [p. 1644 §IV] "Fig. 8 shows an example that maps one 6x6 convolutional kernel with stride two to four decomposed 3x3 convolutional kernels with stride one" Mapping filter to decomposed filter and down-sampling filter through max pooling both interpreted as synonymous with downscaling filter to a smaller filter size.). 

	Regarding claim 7, Lin teaches The method of claim 6, wherein the downscaling comprises max pooling, wherein the max pooling comprises selecting the maximum value of each of a plurality of pools of filter weights of the at least one of the first filters to represent a single filter weight in the downscaled filter. ([p. 1643 §II] "The pooling layer executes the down-sampling operation that not only reduces the feature map size but also improves the translational invariance of features. The typical pooling types are maximum pooling and average pooling that calculate the maximum value and the average value from the corresponding kernels respectively. Fig. 6 shows an example of maximum pooling that has the kernel size 2×2 and the stride size two").

	Regarding claim 9, Lin teaches The method of claim 1, further comprising, on a condition that the second computation speed is slower than the first computation speed, changing the size of at least one of the first filters to a third filter size. ([p. 1644 §IVA] "the new mapped kernel size will be (3x3, 3x4, 4x3, 4x4). This will result in a fully usable output. However, above nonsquare kernel size is bad for implementation. Thus, we use zero padding for the kernels and have a new formula for mapped kernel size as (K/S)×(K/S).  Thus, for a 7×7 kernel with stride 2, we will have four (4×4) kernels. Note this will result in slightly lower hardware utilization (roughly around 70%), but is still better than hardware utilization of the original stride one (roughly 1/S^2, usually <= 1/4)" Lin explicitly teaches that the second kernel size is the first decomposed size (3x3, 3x4, 4x3, 4x4) and that it has worse stride than the original implementation, and therefore a third filter size is used.). 

Claims 11, 16-17, and 19 are directed towards a processor for implementing the method of claims 1, 6-7, 9.  Therefore, the rejections applied to claims 1, 6-7, 9 also apply to claims 11, 16-17, and 19.  Claims 11, 16-17, and 19 also recite additional elements processor and circuitry (Lin [p. 1649] “Our implementation is designed by Verilog and synthesized with TSMC 40nm CMOS technology process”).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 2-3 and 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over Lin and in view of Koivisto (US20190251442A1). 

	Regarding claim 2, Lin teaches The method of claim 1.
	However, Lin does not explicitly teach retraining the CNN, after changing the size of at least one of the first filters to the second filter size, to generate a retrained CNN;
	determining a key performance indicator (KPI) loss of the retrained CNN;
	changing the size of a fewer number of the first filters to the second filter size if the KPI loss exceeds a threshold.  

Koivisto, in the same field of endeavor, teaches retraining the CNN, after changing the size of at least one of the first filters to the second filter size, to generate a retrained CNN; ([¶0021] "In alternate embodiments, the training application 140 may execute the training engine 160 and the filter pruning engine 170 in an iterative fashion. During each non-initial iteration x, the training engine 160 trains (with regularization) the pruned neural network 172(x−1) generated by the filter pruning engine 170 during the previous iteration (x−1) to generate the intermediate neural network 162(x)." With respect to the instant specification, training the intermediate pruned network interpreted as synonymous with retraining the network after pruning.)
	determining a key performance indicator (KPI) loss of the retrained CNN; and ([¶0028] "In one embodiment, the training engine 160 may implement any number of regularization techniques to reduce the average magnitude of one or more of the filters during training. For instance, in some embodiments, the training engine 160 modifies a typical loss term LD(x,y,W) by an additional regularization loss term R(W) to generate an overall loss term L(x,y,W) using the following equations (1) and (2):")
	changing the size of a fewer number of the first filters to the second filter size if the KPI loss exceeds a threshold. ([0030] In one embodiment, the filter pruning engine 170 identifies one or more filters included in the intermediate neural network 162 having average magnitudes lower than a pruning threshold 164. The filter pruning engine 170 may compute an average magnitude for each of the filters in any technically feasible fashion. The average magnitude computed by the filter pruning engine 170 may or may not be consistent with the regularization loss term R(W) implemented in the training engine.  [¶0031] "If the average magnitude is lower than the pruning threshold 164, then the filter pruning engine 170 adds the filter to a pruning list (not shown). Otherwise, the filter pruning engine 170 omits the filter from the pruning list." Pruning only if below threshold is synonymous with the number of filters changed being zero if below threshold.  Zero is guaranteed to be less than the number of filters.). 

	Lin and Koivisto are both directed towards pruning filters in a neural network for performance acceleration.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Lin with the teachings of Koivisto by using the loss term to determine whether or not a filter is pruned. Koivisto teaches as motivation for combination ([¶0026] “Advantageously, as persons skilled in the art will recognize, increasing the level of regularization typically increases the aggressiveness with which the filter pruning engine 170 removes filters from the intermediate neural network 162. Consequently, by varying the regularization parameter for each convolutional layer, the complexity analysis engine 150 indirectly configures the filter pruning engine 170 to prune convolutional layers having higher computational complexities more aggressively than convolutional layers having lower computational complexities. As a result, performing training with layer-specific regularization parameters instead of a single regularization parameter can more effectively reduce the overall inference time associated with the trained neural network 190”).

	Regarding claim 3, the combination of Lin, and Koivisto teaches The method of claim 2, further comprising: changing the size of a greater number of the first filters to the second filter size if the KPI loss does not exceed the threshold. (Koivisto [¶0031] "If the average magnitude is lower than the pruning threshold 164, then the filter pruning engine 170 adds the filter to a pruning list (not shown). Otherwise, the filter pruning engine 170 omits the filter from the pruning list." Changing the size of a greater number of filters interpreted as synonymous with continuing pruning.). 
	
Claims 12-13 are directed towards a processor for implementing the method of claims 2-3.  Therefore, the rejections applied to claims 2-3 also apply to claims 12-13.  Claims 12-13 also recite additional elements processor and circuitry (Lin [p. 1649] “Our implementation is designed by Verilog and synthesized with TSMC 40nm CMOS technology process”).

Claims 4-5 and 14-15  are rejected under 35 U.S.C. 103 as being unpatentable over Lin and in view of Han (“Optimizing Filter Size in Convolutional Neural Networks for Facial Action Unit Recognition”, 2018). 

	Regarding claim 4, Lin teaches The method of claim 1.
	While Lin implicitly teaches upscaling the kernel size by zero padding, Lin does not explicitly teach changing the at least one of the first filters to the second filter size comprises upscaling the at least one of the first filters to a larger filter size.  

Han, in the same field of endeavor, teaches changing the at least one of the first filters to the second filter size comprises upscaling the at least one of the first filters to a larger filter size. ([p. 5074 §3.3.2] "An illustration of the shrink and expand operations to change the filter size. The shrink operation sets zeros to the outside boundary; while the expand operation is to pad the outside boundary with the nearest neighbors from the original filter."). 

	Lin and Han are both directed towards manipulating filter size to affect convolutional neural network performance.  Therefore, Lin and Han are analogous art in the same field of endeavor.  Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Lin with the teachings of Han by upscaling the filter to a larger size.  Han teaches as motivation for combination ([p. 5071 §1] “Experimental results on two benchmarkAU-coded spontaneous databases, i.e., FERA2015 BP4D database [26] and Denver Intensity of Spontaneous Facial Action (DISFA) database [20] have demonstrated that the proposed OFSCNN outperforms the traditional CNNs with the best filter size obtained by exhaustive search and achieves state-of the-art performance for AU recognition. Furthermore, the OFS-CNN also beats a deep CNN using multiple filter sizes with a remarkable improvement in time efficiency during testing, which is highly desirable for realtime applications.  In addition, the OFS-CNN is  capable of estimating optimal filter size for varying image resolution.”). 

	Regarding claim 5, the combination of Lin, and Han teaches The method of claim 4, wherein the upscaling comprises padding the at least one of the first filters with zero weights. (Lin [p. 1644 §IVA] "Thus, we use zero padding for the kernels and have a new formula for mapped kernel size"). 

Claim 14 is directed towards a processor for implementing the method of claim 4.  Therefore, the rejections applied to claim 4 also applies to claim 14.  Claim 14 also recites additional elements processor and circuitry (Lin [p. 1649] “Our implementation is designed by Verilog and synthesized with TSMC 40nm CMOS technology process”).

Claim 15 is directed towards a processor for implementing the method of claim 5.  Therefore, the rejections applied to claim 5 also applies to claim 15.  Claim 15 also recites additional elements processor and circuitry (Lin [p. 1649] “Our implementation is designed by Verilog and synthesized with TSMC 40nm CMOS technology process”).

	Claims 8 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Lin and in view of Li (“PRUNING FILTERS FOR EFFICIENT CONVNETS”, 2017). 

	Regarding claim 8, Lin teaches The method of claim 1.
	However, Lin does not explicitly teach determining a norm of each of the first filters, and ranking the first filters by their norms; 
	wherein a lowest normed filter of the first filters is scaled; and wherein a highest normed filter of the first filters is not scaled.  

Li, in the same field of endeavor, teaches determining a norm of each of the first filters, and ranking the first filters by their norms; ([p. 7 §4.1] "Figure 5: Visualization of filters in the first convolutional layer of VGG-16 trained on CIFAR-10. Filters are ranked by `1-norm").
	wherein a lowest normed filter of the first filters is scaled; and wherein a highest normed filter of the first filters is not scaled. ([p. 4 §3.1] "Recent work (Zhou et al. (2016); Wen et al. (2016)) apply group-sparse regularization (Pnij=1 kFi,jk2 or `2,1-norm) on convolutional filters, which also favor to zero-out filters with small l2-norm" [p. 9 §4.4] "pruning the smallest filters outperforms pruning random filters for most of the layers at different pruning ratios. For example, smallest filter pruning has better accuracy than random filter pruning for all layers with the pruning ratio of 90%" Zeroing out interpreted as synonymous with scaling.  Li explicitly teaches that the filters with the lowest norm are zerod out or pruned and that the largest most important filters remain unchanged.). 

	Lin and Li are both directed towards manipulating filter size to improve convolutional neural network performance.  Therefore, Lin and Li are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Lin with the teachings of Li by scaling filters based on the result of a ranked normalization. Li teaches as a motivation for combination ([p. 3 §2] “Similar to the above work, we use `1-norm to select unimportant filters and physically prune them. Our fine-tuning process is the same as the conventional training procedure, without introducing additional regularization. Our approach does not introduce extra layer-wise meta-parameters for the regularizer except for the percentage of filters to be pruned, which is directly related to the desired speedup. By employing stage-wise pruning, we can set a single pruning rate for all layers in one stage”).  

Claim 18 is directed towards a processor for implementing the method of claim 8.  Therefore, the rejections applied to claim 8 also applies to claim 18.  Claim 18 also recites additional elements processor and circuitry (Lin [p. 1649] “Our implementation is designed by Verilog and synthesized with TSMC 40nm CMOS technology process”).

Claims 10 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Lin and in view of Anwar (“Structured Pruning of Deep Convolutional Neural Networks”, 2017).  

Regarding claim 10, Lin teaches The method of claim 1.
	However, Lin does not explicitly teach, on a condition that the second computation speed is equal to the first computation speed, changing the size of at least one of the first filters to the second filter size.  

Anwar, in the same field of endeavor, teaches on a condition that the second computation speed is equal to the first computation speed, changing the size of at least one of the first filters to the second filter size. ([p. 32:4] "If, during the gradual iterative pruning, the pruned network has similar or better performance when compared with the unpruned network, we use bigger strides and prune the connections more aggressively (use higher pruning ratios)" Changing the size of at least one of the first filters to the second filter size is interpreted as synonymous with pruning.  Stride interpreted as directly correlated with computation speed.). Anwar teaches as motivation for combination ([p. 32:2] “We introduce structured pruning at various granularities for maximum pruning benefits. The pruned networks are easily accelerated with very simple sparse representation. Feature map pruning reduces the width of a convolution layer and directly produces a low-complexity network.”). 

	Lin and Anwar are both directed towards manipulating filter size to improve performance of a convolutional neural network.  Therefore, Lin and Anwar are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Lin with the teachings of Anwar by changing the filter size under the condition that the computation speed of the network using the second filter size was equivalent to the computation of the network using the first filter size. It would be obvious to one of ordinary skill in the art that a greater than or equal to comparison could be used as the conditional comparison.  Anwar teaches as motivation for combination  ([p. 32:2] “We introduce structured pruning at various granularities for maximum pruning benefits. The pruned networks are easily accelerated with very simple sparse representation. Featuremap pruning reduces the width of a convolution layer and directly produces a low-complexity network.”). 

Claim 20 is directed towards a processor for implementing the method of claim 10.  Therefore, the rejections applied to claim 20 also applies to claim 10.  Claim 20 also recites additional elements processor and circuitry (Lin [p. 1649] “Our implementation is designed by Verilog and synthesized with TSMC 40nm CMOS technology process”).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Lin (“Accelerating Convolutional Networks via Global & Dynamic Filter Pruning”, 2018) is considered relevant as it is directed towards dynamically resizing filters in a CNN to improve performance. Wang (“Real-time meets approximate computing: An elastic CNN inference accelerator with adaptive trade-off between QoS and QoR”, 2017) is also considered relevant as it is directed towards a neural network accelerator which takes advantage of filter pruning.  
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720.  The examiner can normally be reached on M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        
/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126