DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-4,9,17,19,20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gibson (patent application publication No. 2017/0323197, which is  publication of application SN 15/585645 filed 05/03/2017) in view of  Turakhia (patent application publication 2018/0164866, which is  publication of application 15/377858 filed December 13, 2016).

Gibson taught the invention substantially as claimed including (as to claim 1) Hardware for implementing a Deep Neural Network (DNN)[note the convolutional neural network provides the a component of a Deep Neural network] having a convolution layer (e.g., see paragraph 0005,0032-0034), the hardware comprising a plurality of convolution engines (240a-240n)(e.g., see fig. 2) each operable to perform a convolution operation by applying a filter to a data window (e.g., see figs.1,6c, 7 and paragraphs 0032-0033) [note the subset of data for a plane provides the data window] and , each filter comprising a set of weights for combination with respective data values of a data window (e.g., see figs. 6c, 7,  and  paragraph 0033), and each of the plurality of convolution engines(240a-240n) comprising: multiplication logic(310) operable to combine a weight of a filter with a respective data value of a data window(e.g., see fig. 3 and paragraph 0041); control logic configured to cause the multiplication logic to combine a weight with a respective data value (e.g., see fig. 3 and paragraph 0041); and accumulation logic (320) configured to accumulate the results of a plurality of combinations performed by the multiplication logic so as to form an output for a respective convolution operation (e.g., see figs. 2,3 and paragraph 0041).
Gibson did not expressly detail control logic configured to cause the multiplication logic to combine a weight with a respective data value if the weight is non-zero, and otherwise not cause the multiplication logic to combine that weight with that data value.  Turakhia however taught this limitation (e.g., see fig. 6 and paragraphs 0037, 0063-0064, 0067-0068) [note Turakhia taught preventing loading the weight and activation data to the multiplier when either the weight or activation was determined to be zero, and when both were determined to be nonzero loading the weight and activation and performing multiplication; this includes preventing  loading weight [and activation] when the weight was zero]. 
It would have been obvious to one of ordinary skill in the art to combine the teachings of Gibson and Turakhia. Both references were directed toward performing multiplication accumulation operations in a neural network. One of ordinary skill would have been motivated to incorporate the Turakhia teachings of preventing loading of input values to the multiplier when at least one of the input were zero to avoid unnecessary processing  where the output would have been known to be zero without performing the arithmetic operation. This would reduce power consumption (e.g., see paragraph 0037 of Turakhia).  
As to the system implementing a Deep convolutional network Turakhia taught Deep convolutional networks are networks of convolutional networks with additional pooling and normalization layers (e.g., see paragraph 0030) and Turakhia taught implementing the network of the invention where the zero valued weights and activations are detected in a Deep convolutional network (e.g., see paragraph 0060).

As to claim 2 Gibson and Turakhia taught  Hardware as claimed in claim 1, Turahkhia taught wherein the control logic is configured to identify zero weights in weights received at the convolution engine using sparsity data provided with those weights.(e.g., see paragraphs 0008 and 0037 and 0041,0046)[ the tags are used to identify zero valued weights and activations and this  provides the sparsity data provided with the weights; and the circuits described paragraphs 0046 and 0053 and figs. 4 and 5 provide the control logic to identify zero weights]

As to claim 3 Gibson and Turakhia taught  Hardware as claimed in claim 1, Turakhia taught  wherein the control logic is further configured to not cause the multiplication logic to combine a weight with a respective data value if that data value is zero. (e.g., see fig. 6 and paragraphs 0037, 0063-0064, 0067-0068)[Turakhia taught an activation which provides the input value that when the activation is zero the control logic does not cause multiplication logic to combine input value with a weight by preventing loading of weight and activation].

As to claim 4 Gibson and Turakhia taught  Hardware as claimed in claim 3, Turakhia taught  wherein the control logic is configured to identify zero data values in data values received at the convolution engine using sparsity data provided with those data values. .(e.g., see paragraphs 0008 and 0037 and 0041,0046)[ the tags are used to identify zero valued weights and activations and this  provides the sparsity data provided with the data values (activations); and the circuits described paragraphs 0046 and 0053 and figs. 4 and 5 provide the control logic to identify zero weights].
As to claim 9 Gibson and Turakhia taught Hardware as claimed in claim 1, Turakhia taught wherein the hardware further comprises one or more weight buffer modules (operand storage 308 and load units 304,304a,304b,304c), each configured to provide weights of one or more filters to any of the plurality of convolution engines (e.g., see fig. 3 and paragraph 0046-0047 and 0067-0068, 0070). Gibson also taught this limitation. [note the input buffer controller along with the input buffers and multiplexers in fig 2 of Gibson provides this limitation].


As to claim 17 Gibson and Turakhia taught hardware as claimed in claim 1, Gibson taught wherein the plurality of convolution engines are arranged to concurrently perform respective convolution operations and the hardware further comprises convolution output logic configured to combine the outputs from the plurality of convolution engines and make available those outputs for subsequent processing according to the DNN (e.g., see figs. 2, 6c, 3 and paragraphs 0075 and 0078).
As to claim 19 Gibson taught  A method for implementing in hardware a Deep Neural Network (DNN) )[note the convolutional neural network provides the a component of a Deep Neural network] having a convolution layer (e.g., see paragraphs 0005,0032-0034),  the hardware comprising a plurality of convolution engines  (240a-240n)(e.g., see fig. 2) each operable to perform a convolution operation by applying a filter to a data window (e.g., see figs.1, 6c, 7 and paragraphs 0032-0033) [note the subset of data for a plane provides the data window], and each filter comprising a set of weights for combination with respective data values of a data window (e.g., see figs. 6c, 7,  and  paragraph 0033), the method comprising, at each of the plurality of convolution engines (240a-240n)(e.g., see fig. 2): receiving weights and corresponding data values for a convolution operation(e.g., see figs. 6c, 7,  and  paragraph 0033-0034,0099) ; for each weight and its respective data value, multiplying the weight by the respective data value (e.g., see fig. 3 and paragraph 0041 and 0100) value; and accumulating the results of the multiplying operations so as to form an output for the respective convolution operation(e.g., see figs. 2,3 and paragraph 0041-0042 and 0100).
Gibson did not expressly detail identifying zero weights in the received weights; for each weight and its respective data value, multiplying the weight by the respective data value only if the weight is non-zero.
Turakhia however taught this limitation (e.g., see fig. 6 and paragraphs 0037, 0063-0064, 0067-0068) [note Turakhia taught preventing loading the weight and activation data to the multiplier when either the weight or activation was determined to be zero, and when both were determined to be nonzero loading the weight and activation and performing multiplication; this includes preventing loading weight [and activation] when the weight was zero]. 
It would have been obvious to one of ordinary skill in the art to combine the teachings of Gibson and Turakhia. Both references were directed toward performing multiplication accumulation operations in a neural network. One of ordinary skill would have been motivated to incorporate the Turakhia teachings of preventing loading of input values to the multiplier when at least one of the input(s) were zero to avoid unnecessary processing where the output would have been known to be zero without performing the arithmetic operation. This would reduce power consumption (e.g., see paragraph 0037 of Turakhia).  
	Due to the similarities between claim 19 and 20; claim 20 is rejected for the same reasons as claim 19 above. 



Claim(s) 5-7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gibson and Turakhia  as applied to claim 2 above, and further in view of Huang (patent application publication No. 2017/0293659).
As to claim 5 Gibson and Turakhia taught    Hardware as claimed in claim 2, Huang taught wherein the sparsity data comprises a binary string, each bit of the binary string corresponding to a respective weight/data value of the set of weights/data values and indicating whether that weight/data value is zero  (e.g., see figs. 3,5a,5b,6 and paragraph 0108-0109,0135-0136) [a mask provides the binary string]
	It would have been obvious to one of ordinary skill in the art to combine the teachings of Gibson and Huang. Both references were directed toward performing multiplication and accumulation in a neural network (e.g. see fig. 11 of Huang). One of ordinary skill would have been motivated to use a binary string to implement the sparsity data at least to enable efficient storage for use in processing arrays as a column or row of value of sparsity data could be accessed quickly by accessing the memory location containing the binary string (mask). This would increase throughput.

As to claim 6 Gibson and Turakhia  taught  Hardware as claimed in claim 1, Huang taught  wherein the hardware further comprises input data logic configured to form the sparsity data on receiving data values of a data window for provision to one of more of the plurality of convolution engines (e.g., see figs. 3, 20,21 and paragraph 0141).

As to claim 7 Gibson and Turahkia taught  Hardware as claimed in claim 1, Huang taught  wherein each of the plurality of convolution engines is arranged to independently perform a different convolution operation such that collectively the convolution engines apply a set of filters to each data window of a set of data windows(e.g., see fig. 11, 12 and paragraphs 0134-0135)[note different compressed blocks are input to the FIFO and masking input of the  multipliers in parallel  and this provide for independent performance of convolution operations and this includes one convolution engine receiving input that is zero  where the operation would be skipped; and another convolution engine receives non-zero input and performs the multiplication of the input(s)  which provides the different convolution operation(s) ].

Claim(s) 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gibson and Turakhia  as applied to claim 1 above, and further in view of Brothers (patent application publication No. 2017/0344876)

As to claim Gibson and Turakhia taught 18. Hardware as claimed in claim 1, Brothers taught  wherein the convolution engine comprises an input buffer (request assembly unit)  for receiving a subset of weights of a filter and a weights register for receiving a subset of data values of a data window, the subsets of weights and data values being received at the respective registers in response to one or more requests from the control logic(e.g., see paragraph 0033). As to the buffering being performed by registers one of ordinary skill would have been motivated to use register(s) at least to provide quick access to the data for providing the weights and data to the multipliers in properly timed manner in the cycle that is to be processed.
It would have been obvious to one of ordinary skill in the art to combine the teachings of Gibson and Brothers. Both references were directed toward problem of neural network processing on a data processor. One of ordinary skill in the art would have been motivated to include a data request unit for buffering the input data and weights at least to enable the system to flexibly control when and which data is input to the multipliers  of a convolution operation at a particular cycle for a properly timed convolution without  convolution engines having to wait for data input and therefore improving throughput.  


Allowable Subject Matter
Claims 8,10-16 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

The following is a statement of reasons for the indication of allowable subject matter: The closest prior art includes Gibson and Turakhia and Huang and Brothers. These references taught limitations in the claims which claims 8 and 10-16 depend, as detailed above.  Claims 8 and 10-16 require among other things respectively the limitations shown below that were not taught by the closest prior art: 

Claim 8 Hardware as claimed in claim 1, wherein each convolution engine is configured to receive configuration information identifying a predefined sequence of convolution operations to perform and the control logic is configured to request weights and data values for combination at the multiplication logic in accordance with that predefined sequence. 
Claim 10 Hardware as claimed in claim 9, wherein the weight buffer modules are accessible to the convolution engines over an interconnect and the control logic of each convolution engine is configured to request weights from the weight buffer modules using an identifier of the filter to which the weights belong.

Claim 11  Hardware as claimed in claim 9, wherein each weight buffer module comprises: a packed buffer for receiving compressed data comprising a set of weights of a filter and corresponding sparsity data; an unpacked buffer for holding an uncompressed subset of the weights of the filter along with their corresponding sparsity data, the compressed data being unpacked into the unpacked buffer according to a predetermined sequence of weights; and weight control logic configured to, in response to a request from a convolution engine for weights available at the unpacked buffer, provide those weights to the convolution engine along with the corresponding sparsity data.

Claim 12 Hardware as claimed in claim 11, wherein the control logic is configured to: on receiving a request from a convolution engine for a first group of weights available at the unpacked buffer according to the predetermined sequence, add that convolution engine to a list of convolution engines applying the filter whose weights are stored at the weight buffer module; and replace each current group of weights at the unpacked buffer with a next group of weights according to the predetermined sequence only when all of the convolution engines on the list have received that current group of weights from the weight buffer module.

Claim 13 Hardware as claimed in claim 12, wherein the control logic is configured, on receiving a request from a convolution engine for a last group of weights available at the unpacked buffer according to the predetermined sequence, remove that convolution engine from the list of convolution engines applying the filter whose weights are stored at the weight buffer module.

Claim 14 Hardware as claimed in claim 11, wherein the control logic is configured to, if the requested weights are not available at the unpacked buffer, defer the request until the weights are available at the unpacked buffer.

Claims 15 Hardware as claimed in claim 11, wherein the unpacked buffer is configured to maintain a plurality of groups of weights, each group of weights being maintained with corresponding sparsity data.

Claims 16 Hardware as claimed in claim 15, wherein the weights of each group are stored at the unpacked buffer such that any zero weights are at one end of the string of weights comprised in the group, the weights of the group otherwise being in sequence, and the sparsity data for the group indicates the position of the zero weights in the group.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Woo (patent No. 9,818,059) disclosed exploiting input data sparsity in neural network compute units (e.g. see abstract).
	Paul (patent application publication No. 2017/0277628) disclosed technologies for memory management of neural networks with sparse connectively (e.g., see abstract). 
	Han, S. et. al., EIE: Efficient Inference Engine on Compressed Deep Neural Network, 2016, IEEE, pp. 243-254.
Kim, D. et. al., A Novel Zero Weight/Activation-Aware Hardware Architecture of Convolution Neural Network, May 2017, IEEE, pp. 1462-1467. (Year: 2017). Kim disclosed storing weights, activations and partial sums in SRAM (e.g., see section IV Architecture subsection  A. Architecture Overview on page 1463 and fig. 2).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC COLEMAN whose telephone number is (571)272-4163. The examiner can normally be reached M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 0-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

ERIC . COLEMAN
Primary Examiner
Art Unit 2183



EC
/ERIC COLEMAN/Primary Examiner, Art Unit 2183