DETAILED ACTION
1.	This communication is in response to Application No. 16/449,009 filed on June 21, 2019 and response to restriction filed on June 13, 2022 in which claims 1-11 are presented for examination.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
3.	The information disclosure statement submitted on 06/21/2019 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Election/Restrictions
4.	Applicant's election with traverse of restriction requirement of Claims 1-11 (Invention Group I), Claims 12-17 (Invention Group II), and Claims 18-20 (Invention Group III) in the reply filed on 06/13/2022 is acknowledged. The traversal is on the ground(s) that Applicant states simultaneous examination will not present an undue burden. This is not found persuasive because Invention Groups I, II, and III are distinct, such that Claims 1-11 (Invention Group I) does not require the inventions presented in Claims 12-17 (Invention Group II), or Claims 18-20 (Invention Group III) and vice versa. The inventions presented in Invention Groups II and III would require a different field of search, such that prior art applicable to one invention would likely not be applicable to the other invention presented in Invention Group I. Further, Applicant acknowledges that Inventions I, II, and III are considered distinct for the reasons set forth by the Examiner in the restriction requirement filed 06/09/2022 and does not provide further reasoning/explanation beyond the statement that “Applicant believes that simultaneous examination will not present an undue burden” filed in Applicant’s response to restriction requirement on 06/13/2022. As a result, Examiner has considered only Claims 1-11, as per the Applicant’s election of Invention 1: Claims 1-11, drawn to a computing system for processing ANN comprising a plurality of layers and zero skipping circuit to locate non-zero weights.
The requirement is still deemed proper and is therefore made FINAL.

Claim Rejections - 35 USC § 103
5.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

6.	Claims 1-5 and 7-11 are rejected under 35 U.S.C. 103 as being unpatentable over David et al. (hereinafter David) (US PG-PUB 20190108436), in view of Martin et al. (hereinafter Martin) (US PG-PUB 20190147327).
Regarding Claim 1, David teaches a computing system for processing an artificial neural network (ANN), comprising: 
a processor comprising a zero-skipping circuit configured to locate non-zero weights (David, Par. [0107], “In operation 606, a processor, e.g., in prediction mode, may retrieve from memory and run the sparse neural network of operation 604 to compute an output based only on the non-zero weights (and not based on the zero weights) of the sparse neural network.”, thus, a processor is disclosed and is able to run the neural network based only on located non-zero weights); 
a memory (David, Par. [0100], “Memory unit(s) 558 and 515 may include, for example, a random access memory (RAM), a dynamic RAM (DRAM), a flash memory, a volatile memory, a non-volatile memory, a cache memory, a buffer, a short term memory unit, a long term memory unit, or other suitable memory units or storage units.”, thus, memory is disclosed, as shown in Figure 5 labels 558 and 515); and 
an ANN stored within the memory (David, Par. [0097], “Local endpoint device(s) 550 may each include one or more processor(s) 556 for training, and/or executing prediction based on, the weights of the sparse neural network stored in memory 558.”, thus the neural network is stored within the memory), wherein the ANN comprises a plurality of layers (David, Par. [0042], “CNN 400 may include a plurality of layers 402, each layer 402 including one or more channels 403, each channel 403 including a plurality of artificial neurons.”, thus, the CNN (which is a class of an artificial neural network, briefly mentioned in Par. [0003]) comprises a plurality of layers) including one or more convolution layers, wherein each of the one or more convolution layers comprises a plurality of filters (David, Par. [0042], “CNN 400 may be represented by a plurality of convolution filters 404. Each filter 404 represents a group of a plurality of weights that are the convolution or transformation of regions of neurons (e.g., representing an N×N pixel image region) of one channel to neurons in a channel of an (adjacent or non-adjacent) convolution layer”, therefore, one or more convolution layers is disclosed and each of the one or more convolution layers comprises a plurality of filters), each filter comprises a plurality of channels (David, Par. [0029], “Some embodiments may generate a sparse convolutional neural network (CNN). A CNN is represented by a plurality of filters that connect a channel of an input layer to a channel of a convolutional layer.”, therefore, each filter comprises a plurality of channels), each channel comprises a plurality of rows, and each row comprises a plurality of weights (David, Par. [0042], “In general, convolution filter 404 may have various dimensions including one-dimensional (1D) (e.g., a 1×N row filter or N×1 column filter operating on a column or row of neurons), two-dimensional (2D) (e.g., a N×M filter operating on a 2D grid of neurons), three-dimensional (3D) (e.g., a N×M×P filter operating on a grid over multiple channels in a layer), . . . , or N-dimensional (ND) (e.g., operating on a grid over multiple channels and multiple layers). While only a few filters are shown for simplicity, often each layer 402 may be represented by hundreds or thousands of filters 404. Computing weights for hundreds or thousands of convolutions filters is a complex and time-consuming task.”, thus, a plurality of rows with a plurality of weights is disclosed), 
wherein the plurality of channels in the one or more convolution layers comprises one or more fully pruned channel (FPCs) and one or more mixed 2D channel (MCs) (David, Par. [0042], “The same single convolution filter 404 of N×N weights is used to convolve all N×N groups of neurons throughout the input channel. In general, convolution filter 404 may have various dimensions including one-dimensional (1D) (e.g., a 1×N row filter or N×1 column filter operating on a column or row of neurons), two-dimensional (2D) (e.g., a N×M filter operating on a 2D grid of neurons), three-dimensional (3D) (e.g., a N×M×P filter operating on a grid over multiple channels in a layer), . . . , or N-dimensional (ND) (e.g., operating on a grid over multiple channels and multiple layers).”, therefore, one or more mixed 2D channels is disclosed), wherein each of the one or more FPCs comprises only zero weights (David, Par. [0029], “Some embodiments may generate a sparse CNN by pruning or zeroing entire filters that have all zero or near zero weights representing weak convolutional relationships between channels. An new CNN indexing is used that independently and uniquely identifies each filter in the CNN so that pruned filters are not stored, reducing convolution operations and memory usage.”, therefore, one or more fully pruned channels with all zero weights are disclosed) and each of the one or more MCs comprises at least one non-zero weight (David, Par. [0043], “According to embodiments of the invention, weak or near zero filters may be pruned and deleted to avoid their associated convolution operations and speed-up training and/or prediction of CNN 400. Whereas conventional CNNs store and operate on zero filters in the same way as non-zero filters, which yields no significant storage or processing benefit to pruning, according to embodiments of the invention, a new data structure 406 is provided which only stores non-zero filters 404.”, thus, one or more mixed 2D channels comprise at least one non-zero weight. Particularly, in Fig. 4, label 404 white filters represent those with non-zero weights, also shown in data structure 406, whereas black filters represent empty/all zero weights), and 

David does not teach wherein at least a portion of the one or more MCs satisfy a limited zero sequence (LZS) condition based on a number of weights the zero-skipping circuit is configured to process in a single cycle.
However, Martin teaches wherein at least a portion of the one or more MCs satisfy a limited zero sequence (LZS) condition (Martin, Par. [0039], “The weights of each group may be stored at the unpacked buffer such that any zero weights are at one end of the string of weights comprised in the group, the weights of the group otherwise being in sequence, and the sparsity data for the group indicates the position of the zero weights in the group.”, therefore, zero weights are stored at one end of the string of weights within a mixed channel and the position of the zero weights is also indicated. Further, Par. [0147] further illustrates how zero weight values are handled in terms of multiplication operations and zero skipping) based on a number of weights the zero-skipping circuit is configured to process in a single cycle (Martin, Par. [0153], “In this manner, each neuron engine would be configured to process a deterministic sequence of neurons, and the neuron engines collectively can process the total number of neurons available for processing. An arrangement in accordance with this scheme is illustrated in FIGS. 5 and 6.”, thus, the engine/circuit is configured to process a sequence of neurons and their according weights within each processing cycle – more information on the processing cycle of the neuron engine can be found in Par. [0131]).

It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the computing system for processing an artificial neural network (ANN) comprising a processor with a zero-skipping circuit, memory, and an ANN stored within the memory wherein the channels in the layers of the ANN comprise fully pruned (only zero weights) and mixed 2D (at least one non-zero weight) channels, as disclosed by David to include the limited zero sequence (LZS) condition based on a number of weights the zero-skipping circuit is configured to process in a single cycle, as disclosed by Martin. One of ordinary skill in the art would have been motivated to make this modification to create a computing system for processing an artificial neural network that limits the sequence of zero weights within the neural network to implement zero skipping and improve overall performance and efficiency (Martin, Par. [0107], “Weights and input data are frequently zero in CNNs. Weights are often zero as a result of being inserted during a mapping process prior to operating the CNN on the input data. Weight and input data sets comprising a significant number of zeros can be said to be sparse. In the convolutional layer input values are multiplied by their respective weights. Consequently, a significant number of operations in the convolutional layer can result in a zero output. The performance of the system can be improved by skipping (i.e. not performing) these ‘multiply by zero’ operations.”)

Regarding Claim 2, David in view of Martin teaches the computing system of claim 1, wherein: the LZS condition comprises a bounded length zeroes (BZS) condition that imposes a maximum length on zero sequences (Martin, Par. [0197-0198], “An example of a weight buffer 240 is shown in FIG. 4. A weight buffer stores its weights in a compressed format (e.g. with the zeros removed and with a configurable reduced bit depth) in packed weights buffer 401. The compressed (packed) weights 409 are read in from external memory and stored in the compressed format at a packed weights buffer 401. This reduces the external memory bandwidth, and allows more weights to be stored in a given size of packed weights buffer 401. In order to provide weight data to the neuron engines an unpacker 402 of the weight buffer unpacks the weights into unpacked weight storage 404. Each set of unpacked weights 406 may be referred to as a word (which may or may not be considered to include the corresponding sparsity map 407 and/or index 405). The packed weight data may be only partially unpacked so as to decompress the weight data (e.g. for bit depth) but not for sparsity, i.e. zero value weights are not restored to the correct position in a sequence of weights in a word.”, therefore, the weight buffer, which is a predetermined size, would not unpack/decompress weight data for sparsity, meaning that zero weight values/zero sequences would be removed according to the zero pruning that occurs – hence, the zero sequence is bounded as they are removed from consideration and only non-zero weights are processed. Further information is provided in Par. [0198]).
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Regarding Claim 3, David in view of Martin teaches the computing system of claim 1, wherein: the LZS condition comprises a maximum number of rows over which zero sequences extend (David, Par. [0071], “A Modified Compressed Sparse Row: Improves CSR representation may replace the conventional matrix with two arrays: (1) The first data array holds the diagonal values first (e.g., including zeros, if there are any on the diagonal), then the remaining non-zero elements in row-major order (same way as the regular CSR). (2) The second (index) data array is of the same length as the first one. The elements matching the diagonal elements in the first array point to the first element of that row in the data array (so the first element is always the size of the diagonal plus one), while the elements matching the rest of the data specify the column index of that data element in the matrix. For example, a 4×4 matrix with the following values: [[1,2,0,3], [0,4,5,0], [0,0,0,6], [0,0,0,7]], would become the first data array: [1,4,0,7,2,3,5,6] and the second index array: [4,6,7,7,1,3,2,3].”, therefore, using the modified compressed sparse row representation, the rows over which zero sequences extend would be limited, as the diagonal is considered first even if zeroes are present, but then all non-zero elements are considered in row-major order. In this particular example of a 4x4 matrix, it is apparent that the zero sequence is limited throughout the rows, since non-zero elements are solely populated in the data array, after populating the diagonal – zero sequences are limited).

Regarding Claim 4, David in view of Martin teaches the computing system of claim 3, wherein: the maximum number of rows is two (David, Par. [0071], “A Modified Compressed Sparse Row: Improves CSR representation may replace the conventional matrix with two arrays: (1) The first data array holds the diagonal values first (e.g., including zeros, if there are any on the diagonal), then the remaining non-zero elements in row-major order (same way as the regular CSR). (2) The second (index) data array is of the same length as the first one. The elements matching the diagonal elements in the first array point to the first element of that row in the data array (so the first element is always the size of the diagonal plus one), while the elements matching the rest of the data specify the column index of that data element in the matrix. For example, a 4×4 matrix with the following values: [[1,2,0,3], [0,4,5,0], [0,0,0,6], [0,0,0,7]], would become the first data array: [1,4,0,7,2,3,5,6] and the second index array: [4,6,7,7,1,3,2,3].”, thus, the maximum number of rows in which zero sequences (two or more consecutive zeros in a row) appear is two).

Regarding Claim 5, David in view of Martin teaches the computing system of claim 1, wherein: the LZS condition is based on a maximal number and a location of zero weights in a sequence of filter weights the zero-skipping circuit is configured to process and skip in a single cycle (Martin, Par. [0039], “The weights of each group may be stored at the unpacked buffer such that any zero weights are at one end of the string of weights comprised in the group, the weights of the group otherwise being in sequence, and the sparsity data for the group indicates the position of the zero weights in the group.”, thus, the sparsity data for the group would indicate the position/location of zero weights in the sequence of filter weights).
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Regarding Claim 7, David in view of Martin teaches the computing system of claim 1. 
David in view of Martin further in view of Z teaches wherein: the one or more FPCs comprise at least 20% of channels in the one or more convolution layers (David, Par. [0021], “ [0021] Results in both prediction mode and training mode having a linear speed-up directly proportional to the amount of sparsity induced in the neural network. For example, a 50% sparse neural network (retaining less than 50% or a minority of its weights) results in two times (or 200%) faster prediction and training, and a 90% sparse neural network (retaining 10% of its weights) results in 10 times (or 1000%) faster prediction and training. In general, the greater the sparsity of the neural network, the faster the prediction and training times.”, therefore, a neural network that has 50% sparsity means that around 50% of weights have been fully pruned/removed).

Regarding Claim 8, David in view of Martin teaches the computing system of claim 1, wherein: the portion of the one or more MCs comprise at least 95% of the one or more MCs (David, Par. [0043], “Because filters 404 are explicitly indexed in each data entry, the matrix position of the data entries no longer serves as their implicit index, and filters 404 entries may be shuffled, reordered or deleted with no loss of information. In particular, there is no reason to store a zero filter (a filter with all zero weights) as a placeholder to maintain indexing as in matrix representations. Accordingly, when channels of neurons are disconnected (by pruning) or not connected in the first place, data structure 406 simply deletes or omits an entry for the associated filter entirely (e.g., no record of any weight or any information is stored for that filter). In various embodiments, data structure 406 may omit 1D, 2D, 3D, or ND filters, e.g., as predefined or as the highest dimensionality that is fully zeroed.”, thus, 100% of mixed channels would satisfy a limited zero sequence, as all zero weights/zero filters are not stored and are pruned from the network – See claims 3 and 4 for an explanation on modified sparse row representation and how zeros are sequenced in this representation).

Regarding Claim 9, David in view of Martin teaches the computing system of claim 1, wherein: the LZS condition is based at least in part on a scanning order of the computing system (David, Par. [0029], “Some embodiments may generate a sparse convolutional neural network (CNN). A CNN is represented by a plurality of filters that connect a channel of an input layer to a channel of a convolutional layer. The filter scans the input channel, operating on each progressive region of neurons (e.g., representing a N×N pixel image region), and maps the convolution or other transformation of each region to a single neuron in the convolution channel. By connecting entire regions of multiple neurons to each single convolution neuron, filters form synapses having a many-to-one neuron connection, which reduces the number of synapses in CNNs as compared to the one-to-one neuron connections in standard NNs. Some embodiments may generate a sparse CNN by pruning or zeroing entire filters that have all zero or near zero weights representing weak convolutional relationships between channels. An new CNN indexing is used that independently and uniquely identifies each filter in the CNN so that pruned filters are not stored, reducing convolution operations and memory usage.”, thus, the filter scans the input channel progressively and pruned filters are not stored to reduce convolution operations – thus, the limited zero sequence condition is based on how the inputted neurons are scanned/indexed by the filter).

Regarding Claim 10, David in view of Martin teaches the computing system of claim 9, wherein: the LZS condition applies to sequences of consecutive zero weights over more than one channel according to the scanning order of the computing system (David, Par. [0043], “Because filters 404 are explicitly indexed in each data entry, the matrix position of the data entries no longer serves as their implicit index, and filters 404 entries may be shuffled, reordered or deleted with no loss of information. In particular, there is no reason to store a zero filter (a filter with all zero weights) as a placeholder to maintain indexing as in matrix representations. Accordingly, when channels of neurons are disconnected (by pruning) or not connected in the first place, data structure 406 simply deletes or omits an entry for the associated filter entirely (e.g., no record of any weight or any information is stored for that filter). In various embodiments, data structure 406 may omit 1D, 2D, 3D, or ND filters, e.g., as predefined or as the highest dimensionality that is fully zeroed.”, thus, the limited zero sequence/zero pruning is applied to more than one channel according to the scanning order/indexing of the computing system).

Regarding Claim 11, David in view of Martin teaches the computing system of claim 1, wherein: the ANN further comprises one or more fully connected layers that satisfy the LZS condition (Martin, Par. [0224], “When the layers are fully connected, the weights may be streamed in from external memory constantly. Once an initial request for weight data has been sent, the weight buffer may be configured to provide a stream of weights and the respective sparsity maps, with each weight being used only once. The weights may be read in a filter interleaved order in order to allow multiple neuron engines to run simultaneously. For fully connected layers there is typically no benefit of having more neuron engines than filter buffers, since only one neuron engine can read from each filter buffer. If there are more neuron engines than filter buffers some of the neuron engines will be unused when operating on fully connected layers. However, for a given implementation, the performance is likely to be limited by the external memory read bandwidth for the weights rather than by the compute throughput.”, therefore, the neural network comprises one or more fully connected layers that receive streamed weight input from the weight buffer (see Claim 5 & Par. [0039]), such that the LZS condition is considered, since zero weights sequenced at the end of the buffer and not considered in multiplication/addition operations) .
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

6.	Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over David et al. (hereinafter David) (US PG-PUB 20190108436), in view of Martin et al. (hereinafter Martin) (US PG-PUB 20190147327), further in view of Hu et al. (“Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures”).
Regarding Claim 6, David in view of Martin teaches the computing system of claim 1. 
David in view of Martin does not teach wherein: the one or more convolution layers comprises at least 33% zero weights.
However, Hu teaches wherein: the one or more convolution layers comprises at least 33% zero weights (Hu, Pg. 3, Table 1, which depicts that the mean average percentage of zeros is greater than 33% in all layers).

It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the computing system of Claim 1, as disclosed by David in view of Martin to include wherein the one or more convolution layers comprise at least 33% zero weights, as disclosed by Hu. One of ordinary skill in the art would have been motivated to make this modification to create a computing system for processing an artificial neural network, such that the convolution layer comprises at least 33% zero weights to indicate greater redundancy, such that zero weights may be pruned to obtain a better designed network that is able to more efficiently and quickly process non-zero weights (Hu, Pg. 3, “Since the VGG-16 network has inverse pyramid shape, most redundancy occurs at the higher convolutional layers and the fully connected layers. The higher mean APoZ also indicates more redundancy in a layer. Detailed distributions of APoZ of 512 CONV5-3 neurons and 4096 FC6 neurons are shown in Figure 1, 2 respectively. Since a neural network has a multiplication-addition-activation computation process, a neuron which has its outputs mostly zeros will have very little contribution to the output of subsequent layers, as well as to the final results. Thus, we can remove those neurons without harming too much to the overall accuracy of the network. In this way, we can find the optimal number of neurons for each layer and thus obtain a better network without redesign and extensive human labor.”)

Conclusion
7.	The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure:
He et al. (“Channel Pruning for Accelerating Very Deep Neural Networks”) disclosed channel pruning which accelerates convolutional layers, based on a two-step algorithm to effectively prune each layer.
Molchanov et al. (“Pruning Convolutional Neural Networks for Resource Efficient Inference”) disclosed methods for pruning convolutional layers in neural networks to enable efficient inference.
Parashar et al. (“SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks”) disclosed a sparse CNN accelerator architecture, which improves performance and energy efficiency by exploiting zero-valued weights from network pruning. 
Aghasi et al. (“Net-Trim: Convex Pruning of Deep Neural Networks with Performance Guarantee”) disclosed pruning/sparsifying a trained network layer-wise and removing connections of zero weights.
Anwar et al. (“Structured Pruning of Deep Convolutional Neural Networks”) disclosed structured sparsity at various scales for convolutional neural network, using pruning.
Li et al. (“Pruning Filters for Efficient ConvNets”) disclosed an acceleration method for CNNs, in which filters a pruned from the CNN if they are identified as having a small effect on output accuracy.
Gao et al. (“Dynamic Channel Pruning: Feature Boosting and Suppression”) disclosed feature boosting and suppression within deep convolutional neural networks to predictively amplify salient convolutional channels and skip unimportant channels at runtime.
Wang et al. (US Patent 11200495) disclosed a convolutional neural network model which is trained and pruned multiple times, and also pruned at a pruning ratio that may be adjusted.
Wang et al. (US PG-PUB 20190303757) disclosed a deep learning accelerator with processing elements and a dispatcher which dispatches input data in the input activation and non-zero weights in the multi-dimensional weights to the processing elements.
Dally et al. (US PG-PUB 20180046916) disclosed methods, products, and systems for performing computations using a sparse convolutional neural network accelerator.
Wang et al. (US PG-PUB 20210097393) disclosed methods of pruning a convolutional neural network based on number of channels, lookup tables, and pruning filters.
Phan et al. (US PG-PUB 20200293876) disclosed compressing a neural network based on a compression ratio of the number of zero weights over the number of non-zero weights.
Gorokhov et al. (US PG-PUB 20210027166) disclosed systems, apparatuses, and methods for dynamic pruning of neurons on-the-fly to accelerate neural network inferences.

8.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Devika S Maharaj whose telephone number is 571-272-0829. The examiner can normally be reached Monday - Thursday 7:30am - 4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/D.S.M./Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123