DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-20 are rejected under 35 USC 103 as being unpatentable over Modha (US 20190332925 A1) in view of David et al. (US 2019/0108436 A1)

Regarding claim 1. 
 computer-implemented method, comprising: obtaining an input tensor and a plurality of filters at a layer within a neural network (see ¶ 2, “The memory of each of the plurality of neural cores comprises at least a portion of a weight tensor. The weight tensor comprising a plurality of filters. Each neural core is adapted to retrieve locally or receive a portion of an input data tensor,” also see figure 1, and 16, “Each neural network layer is associated with a parameter tensor V, weight tensor W, input data tensor X, output data tensor Y, and intermediate data tensor Z.”, also ¶ 21); 
segmenting the input tensor into a plurality of sub-tensors (see ¶ 15 “A tensor is a multidimensional array of numerical values. A tensor block is a contiguous subarray of the elements in a tensor.” [i.e. tensor block corresponds to sub-tensor]); 
dividing a channel dimension of each of the plurality of filters into a plurality of channel groups (see ¶ 30 and figure 2, “Filter 201 comprises a plurality of weights w.sub.1 . . . w.sub.9”, [i.e. where weights of the filter corresponds to channels dimensions], also ¶ 30, “Filter 201 is applied to each tile of image 202. In this example, two sequential 3.times.3 tiles are illustrated. The result of each tile is an element of feature map 203. The result of the first sequential tile is indicated by a first dot on the feature map”, [i.e. where each dot in the feature map corresponds to channel group]); 
segmenting, according to the plurality of … channel groups, each of the plurality of filters into a plurality of sub-filters (see ¶ 43, “the filter tensor is similarly decomposed into blocks along its input feature dimension”, [i.e. the filter tensor is similarly decomposed into blocks corresponds to segmenting filters to sub filters]); 
and assigning the plurality of sub-tensors and the plurality of sub-filters to a plurality of processors for parallel convolution processing (see ¶ 58, “for each column in F, all non-zero values are read. In some embodiments using neural cores, the input data value is read in parallel with the weight value. Inputs with a zero value are ignored to exploit the sparsity of F. The product of each input with the corresponding weight is added to the partial sum as per Equation 4, above. the addition is performed in parallel”, [i.e. equation 4 shows parallel convolution processing]).
Modha teaches (see ¶ 58, “for each column in F, all non-zero values are read. In some embodiments using neural cores, the input data value is read in parallel with the weight value. Inputs with a zero value are ignored to exploit the sparsity of F”, [I.e. ignoring all zero values corresponds to pruning filters for all channels groups comprise same number of non-zero weights], also see figure 2 and ¶ 30, each channel group (dot in the feature map) have the same amount of filters, where Inputs with a zero value are ignored to exploit the sparsity, also see ¶ 54, “each column of F has at most RxSxT non-zero entries”, [i.e. each column has RxSxT non-zero entries corresponds to having the same number of non-zero weights]), but it does not explicitly teach pruning each of the plurality of filters so that each of the plurality of channel groups of each filter comprises a same number of non-zero weights. 
David teaches for each of the plurality of filters, pruning each of the plurality of channel groups within the filter so that each channel group of the filter comprises a same number of non-zero weights (see figures 1-4, teaching pruning dense NN to sparse NN which takes out empty filters as showing in figure 4, also see ¶¶ 43 “a new data structure 406 is provided which only stores non-zero filters 404.  , also see ¶¶ 29, 34-36, 45, 96, 104 and 108 which further teaches and clarifies the pruning channels based on filters and non-zero weights).
Both Modha and David pertain to the problem of computation of an input tensor, thus being analogous. It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Modha and David to prune filters for channel to comprise same number of non-zero weights. The motivation for doing so would be the speed of running a neural network is proportional to the number of weights in the neural network. Pruning or omitting connections in sparse neural network may result in a direct prediction speed-up in proportion to the amount of sparsity (See David ¶¶ 37 and 45).
Regarding claim 2.
Modha and David teaches the method of claim 1, 
Modha further teaches wherein the method further comprises performing the parallel convolution processing by: iterating, by each of the plurality of processors, each non-zero weight of a sub-filter assigned to the processor (see ¶ 58, “for each column in F, all non-zero values are read… the input data value is read in ); 
and identifying, by the processor, a corresponding input value in a sub-tensor assigned to the processor to perform a multiply-and-accumulate (MAC) operation (see ¶ 13, “A weighted sum is an intermediate result computed by multiplying each input with the corresponding weight and accumulating the products. A partial sum is a weighted sum of a subset of inputs. A weighted sum of all inputs may be computed in stages by accumulating one or more partial sums”, also see
    PNG
    media_image1.png
    299
    790
    media_image1.png
    Greyscale
).

Regarding claim 3.
Modha and David teaches the method of claim 1, 
Modha further teaches further comprising: storing the non-zero weights in each of the plurality of channel groups of each filter as index-value pairs, wherein each of the index-value pairs comprises a channel-dimension index, a width-dimension index, and a corresponding non-zero weight (see ¶ 56, Equation 4, “for each non-zero value in a filter input column, the filter input row and element value are ).

Regarding claim 4.
Modha and David teaches the method of claim 3, 
Modha further teaches wherein the method further comprises performing the parallel convolution processing by: for each of the non-zero weights stored as an index-value pair, identifying a corresponding input value in an assigned sub-tensor at a location identified by the channel- dimension index and the width-dimension index of the index-value pair representing each non- zero weight (see ¶ 36, “since all elements of the same output feature share the same filter weights that are replicated at each output location. The shared filter weights can be described more compactly by a dense 4-dimensional filter tensor F that contains all of the filters that compute output features of the layer, and is indexed by the output feature dimension (output feature k) and 3 filter input dimensions (filter row r, filter column s, filter feature t).”).
Regarding claim 5.
Modha and David teaches the method of claim 1, 
Modha further teaches further comprising: rotating the plurality of sub-filters among the plurality of processors (see ¶ 36, “since all elements of the same output feature share the same filter weights that are replicated at each output location. The shared filter weights can be described more compactly by a dense 4-dimensional filter ).

Regarding claim 6.
Modha and David teaches the method of claim 1, 
Modha further teaches wherein segmenting each of the plurality of filters into the plurality of sub-filters according to the plurality of … channel groups comprises: segmenting the filter into a plurality of chunks according to the plurality of … channel groups (see ¶ 44, “FIG. 4. Input data 401 are divided into blocks 402. Data block 402 is provided to neural core 403 along with a corresponding portion 404 of the filter tensor.”); and segmenting each of the plurality of chunks into a plurality of horizontal planes (see ¶ 45, “Each output feature block of the filter tensor is sent to a different neural core, so each block of the input tensor is sent to as many neural cores as there are output feature blocks.”, also see figure 3, 304 and 303, horizontal planes).
David teaches pruning channel groups (see figures 1-4, teaching pruning dense NN to sparse NN which takes out empty filters as showing in figure 4, also see ¶¶ 43 “a new data structure 406 is provided which only stores non-zero filters 404.  Conventional CNNs represent each layer by a 4-dimensional matrix (not including the batch of training samples), where each filter is represented by a 2D matrix (e.g., 3.times.3 or 5.times.5) and is positioned in the other two dimensions of the matrix to , also see ¶¶ 29, 34-36, 45, 96, 104 and 108 which further teaches and clarifies the pruning channels based on filters and non-zero weights).
Both Modha and David pertain to the problem of computation of an input tensor, thus being analogous. It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Modha and David to prune filters for channel to comprise same number of non-zero weights. The motivation for doing so would be the speed of running a neural network is proportional to the number of weights in the neural network. Pruning or omitting connections in sparse neural network may result in a direct prediction speed-up in proportion to the amount of sparsity (See David ¶¶ 37 and 45).

Regarding claim 7.
Modha and David teaches the method of claim 6, 
Modha further teaches further comprising: pruning the plurality of horizontal planes so that each of the plurality of horizontal planes comprises the same number of non-zero weights (see ¶ 47, “By storing only the non-zero values, a sparse tensor can be compressed to use much less memory than a dense tensor with the same shape. Similarly, by skipping zero values, multiplication with a sparse tensor can use many fewer operations and thereby less energy than multiplication with a dense tensor”).

Regarding claim 8.
Modha and David teaches the method of claim 1, 
Modha further teaches wherein after the parallel convolution processing, the plurality of processors generate a plurality of partial sums, and the method further comprises: accumulating the plurality of partial sums to obtain an output tensor (see ¶ 21, “To compute an output tensor block, a neural core multiplies an M.times.1 input tensor block 101 with an M.times.N weight tensor block 102 and accumulates the products into weighted sums that are stored in a 1.times.N intermediate tensor block 103.”); and feeding the output tensor as an input tensor for a next layer of the neural network (see ¶ 23, “Training is the process of modifying the neural network model to perform a desired function. Inference is the process of applying a neural network to an input to produce an output, without modifying the neural network model”).

Claim 9 recites a system comprising a plurality of processors to perform the method recited in claim 1. Therefore the rejection of claim 1 above applies equally here. Modha also teaches the addition elements of claim 9 not recited in claim 1 comprising a plurality of processors (see figure 7 element 16 processing unit); and one or more non-transitory computer-readable memories coupled to the plurality of processors and configured with instructions executable by the plurality of processors to cause the system to perform operations (see figure 7 element 28 memory).

Claims 10-15 recite a system comprising a plurality of processors to perform the method recited in claims 2-8. Therefore the rejection of claim 1 above applies equally here.
Claim 16 recites a non-transitory computer-readable storage medium to perform the method recited in claim 1. Therefore the rejection of claim 1 above applies equally here. Modha also teaches the addition elements of claim 16 not recited in claim 1 comprising a plurality of processors to cause the plurality of processors to perform operations  (see figure 7 element 16 processing unit and element 28 memory).

Claims 17-20 recite a non-transitory computer-readable storage medium to perform the method recited in claims 2-3 and 6-7. Therefore the rejection of claim 1 above applies equally here.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IMAD M KASSIM whose telephone number is (571)272-2958. The examiner can normally be reached mon-fri 730-500.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J. Huntley can be reached on (303) 297 - 4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/IMAD KASSIM/Examiner, Art Unit 2129                                                                                                                                                                                                        
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129