DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is in response to submission of application on 4/5/2018.
Claims 1-19 are presented for examination.
Oath/Declaration
For the record, Examiner acknowledges that the Oaths/Declarations submitted on 5/15/2018 have been received.
Information Disclosure Statement
The information disclosure statements submitted on 4/5/2018 and 10/3/2019 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are considered by the examiner.
Drawings
The Drawings filed on 4/5/2018 are acceptable for examination purposes.
Specification
The Specification filed on 4/5/2018 is acceptable for examination purposes.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the 

Claims 1-7, 10-19 are rejected under 35 U.S.C. 103 as being unpatentable over Young (US 9842293 B2, herein Young), and Kim et al (A Novel Zero Weight/Activation-Aware Hardware Architecture of Convolutional Neural Network, herein Kim).
Regarding claim 1,
	Young teaches a circuit for performing [convolutional neural network] computations for a neural network, the circuit comprising: (Young, Column 1, Paragraph 5, Line 36 “In general, this specification describes a special-purpose hardware circuit that computes neural network inferences.” In other words, hardware circuit is circuit, and computes neural network inferences is performing neural network computations.)
	a transposing buffer configured to receive actuation feature vectors along a first dimension of the transposing buffer and to output feature component vectors along a second dimension of the transposing buffer; (Examiner notes from the Specification of the instant application, paragraph [0020], line 2 “The circuit includes buffers 104 that include a transposing buffer and a weight buffer.”  In other words, buffers 104 from FIG. 1 of the instant application includes transposing buffers and weight buffers. Young, Column 5, paragraph 1, Line 1 “The value loaders 302 can receive the activation inputs from a unified buffer, e.g., the unified buffer 208 of FIG. 2.”  And, Column 5, Paragraph 5, Line 34 “The accumulated output can be passed along the same column as the weight input, e.g., towards the bottom of the column in the array 306.” In other words, value loader 302 is transposing buffer, receiving activation inputs is receive actuation feature vectors, and accumulated output can be passed along the same column as the weight is output feature component vectors along a second dimension.) 
a weight buffer configured to store kernel weight vectors along a first dimension of the weight buffer and further configured to output kernel component vectors along a second dimension of the weight buffer; and (Young, Column 5, Paragraph 2, line 12 “The weight fetcher interface 308 can receive the weight input from a memory unit, e.g., the dynamic memory 210 of FIG. 2. The weight fetcher interface 308 can send a corresponding weight input to a distinct top-most cell of the array 306.” And, Column 3, Paragraph 4, Line 22 “In some implementations, both the sets of weight inputs and the sets of activation inputs can be received from the unified buffer.  In other words, dynamic memory 210 can store kernel weight vectors, and weight fetcher interface 308 can output the corresponding weight input.  Also, in some implementations, the unified buffer can store both the activation inputs and the weight inputs.  In that case, it would mirror the function of the buffers (104) of the instant application.) 
	a systolic array configured to receive the kernel weight vectors along a first dimension of the systolic array and to receive the feature component vectors along a second dimension of the systolic array, where the systolic array comprises an array of multiply and accumulate (MAC) processing cells. (Young, Column 4, Paragraph 4, Line 25 “The dynamic memory 210 and the unified buffer 208 can send the sets of weight inputs and the sets of activation inputs, respectively, to the matrix computation unit 212.  In some implementations, the matrix computation unit 212 is a two-dimensional systolic array.  The matrix computation unit 212 can also be a one-dimensional systolic array or other circuitry that can perform mathematical operations, e.g., multiplication and addition.  In some implementations, the matrix computation unit 212 is a general purpose matrix processor.” In other words two-dimensional systolic array is systolic array, sets of weight inputs and the sets of activation inputs, respectively, to the matrix computation unit is configured to receive the kernel weight vectors and feature component vectors, and can perform mathematical operations, e.g., multiplication and addition is an array of multiply and accumulate (MAC) processing cells.)

    PNG
    media_image1.png
    610
    696
    media_image1.png
    Greyscale




    PNG
    media_image2.png
    577
    659
    media_image2.png
    Greyscale


	Thus far, Young does not explicitly teach convolutional neural network.
	Kim teaches convolutional neural network (Kim, Page 1, Column 2, Paragraph 2, Line 1 “The proposed architecture is the first hardware accelerator that skips ineffectual computation caused by both zero weights and activations to improve the performance and energy consumptions of convolutional layers of CNNS.” In other words, CNN is convolutional neural network.)
	Kim and Young are both directed to hardware implementations of neural networks and improving the performance of neural networks. In view of the teaching of Kim, it would be 
	One of ordinary skill in the art would be motivated to do this in order to speed up the inference of a convolutional neural network by having it implemented in hardware.
Regarding claim 2,
	The combination of Young and Kim teach the circuit of claim 1, 
	where the feature component vectors and the kernel component vectors are pipelined into the systolic array.  (Young, Column 3, Paragraph 8, Line 60 “The host interface 202 can send the instructions to a sequencer 206, which converts the instructions into low level control signals that control the circuit to perform the neural network computations.  In some implementations, the control signals regulate dataflow in the circuit, e.g., how the sets of weight inputs and the sets of activation inputs flow through the circuit.” And, Column 5, Paragraph 3, Line 19 “In some implementations, a host interface, e.g., the host interface 202 of FIG. 2, shifts activation inputs throughout the array 306 along one dimension, e.g., to the right, while shifting weight inputs throughout the array 306 along another dimension, e.g., to the bottom.  For example, over one clock cycle, the activation input at cell 314 can shift to an activation register in cell 316, which is to the right of cell 314.  Similarly, the weight input at cell 316 can shift to a weight register at cell 318, which is below cell 314.  On each clock cycle, each cell can process a given weight input and a given activation input to generate an accumulated output.”  In other words, control signals regulate dataflow in the circuit, e.g., how the sets of weight inputs and the sets of activation inputs flow through the circuit is feature component vectors are pipelined into the systolic array.)
Regarding claim 3,
	The combination of Young and Kim teach the circuit of claim 1, 
	Thus far, the combination of Young and Kim does not explicitly teach where the feature component vectors and the kernel component vectors are broadcast into the systolic array.  
	Kim teaches where the feature component vectors and the kernel component vectors are broadcast into the systolic array.  (Kim, Page 3, Column 1, Paragraph 1, Step 2. “Activations and weights are broadcast from on-chip SRAM to PEs.”  In other words, activations and weights are broadcast from on-chip SRAM to PEs is feature components and the kernel component vectors are broadcast into the systolic array.)
	It would be obvious before the effective filing date of the claimed invention to combine Kim into the teaching of Young and Kim.  This would result in the ability to broadcast the feature and weight data into the systolic array without having to pipeline the data.
	One of ordinary skill in the art would be motivated to do this in order to speed up the inference time of the neural network by eliminating the time required to input the data one cycle at a time.
Regarding claim 4,
	The combination of Young and Kim teach the circuit of claim 1, 
	where the actuation feature vectors are shifted into the transposing buffer along the first dimension of the transposing buffer and output feature component vectors are shifted out of the transposing buffer along the second dimension.  (Young, FIG. 2 and FIG. 3, the value loaders 302 have activation input (actuation feature vectors) shifted in from the unified buffer (208) in one dimension, and shifted out to the cells 306 in the second dimension.  Young, Column 5, Paragraph 5, Line 34 “The accumulated output can be passed on the same column as the weight input, e.g., towards the bottom of the column in the array 306.” In other words, activation inputs is actuation feature vectors, shift activation inputs throughout the array 306 along one dimension is shifted into the transposing buffer, and accumulated output can be passed on the same column as the weight input, e.g., towards the bottom for the column in the array 306 is output feature component vectors are shifted out of the transposing buffer along the second dimension.)
Regarding claim 5,
	The combination of Young and Kim teach the circuit of claim 1, 
	where systolic array is further configured to pass the kernel weight vectors to neighboring processing cells in the second dimension of the systolic array and to pass the feature component vectors to neighboring processing cells in the first dimension of the systolic array. (Column 5, Paragraph 1,  Line 7 “The value loader can also send the activation input to an adjacent value loader, and the activation input can be used at another left-most cell of the array 306. This allows activation inputs to be shifted for use in another particular cell of the array 306.”  In other words, allows activation inputs to be shifted is pass kernel weight vectors to neighboring processing cells in the second dimension of the systolic array and to pass the feature component vectors to neighboring processing cells in the first dimension of the systolic array. FIG. 3. The solid arrows coming out of the cells shows a path from top to bottom along one dimension, and a path from left to right on the other dimension.)

Regarding claim 6,
	The combination of Young and Kim teach the circuit of claim 1, 
	where systolic array is further configured to output values accumulated in the processing cells, where each processing cell is associated with an output value.  (Young, Column 5, Paragraph 4, Line 28 “On each clock cycle, each cell can process a given weight input and a given activation input to generate an accumulated output.” In other words, each cell can process a given weight input and a given activation input to generate an accumulated output is systolic array further configured to output values accumulated in the processing cells, where each cell is associated with an output value.)
Regarding claim 7,
	The combination of Young and Kim teach the circuit of claim 1, 
	further comprising an output layer configured to receive accumulated values from the MAC processing cells of the systolic array and to perform at least one non-linear, pooling or normalization operation on the received accumulated values.  (Young, Column 5, Paragraph 5, Line 34 “The accumulated output can be passed along the same column as the weight input, e.g., towards the bottom of the column in the array 306.” And, Line 44 “The accumulator units 310 can accumulate each accumulated output to generate a final accumulated value.  The final accumulated value can be transferred to a vector computation unit.” And, Column 4, Paragraph 5, Line 43 “For example, the vector computation unit 214 can apply a non-linear function to outputs of the matrix computation unit, e.g., a vector of accumulated values, to generate activated values.” In other words, vector computation unit is output layer, accumulated output is accumulated values, and apply a non-linear function is perform an at least one non-linear, pooling or normalization operation.)
Regarding claim 10,
	The combination of Young and Kim teach the circuit of claim 1, 
	further comprising an interface to a host data processing system, where the circuit is configured to receive data and commands from the host data processing system via the interface. (Young, Column 3, Paragraph 7, Line 48 “The host interface 202 can receive instructions that include configuration information for a neural network computation.” And Paragraph 8, Line 60 “The host interface 202 can send the instructions to a sequencer 206, which converts the instructions into low level control signals that control the circuit to perform the neural network computations.” In other words, host interface 202 is interface to a host data processing system, and send the instructions to a sequencer 206, which converts the instructions into low level control signals that control the circuit is the circuit is configured to receive data and commands from the host data processing system via the interface.)
Regarding claim 11,
	The combination of Young and Kim teach a non-transitory computer readable medium containing instructions of a hardware description language that define the circuit of claim 1. (Young, Column 9, Paragraph 3, Line 20 “For example, the processor can reuse a weight value some number of times while retrieving weights from memory, e.g., Dynamic Random Access Memory (DRAM).” And, Column 3, Paragraph 7, Line 48 “The host interface 202 can receive instructions that include configuration information for a neural network computation.”  In other words, DRAM is a non-transitory computer readable medium, and receive instructions that include configuration information is instructions that define the circuit.)
 Regarding claim 12,
	The combination of Young and Kim teach a non-transitory computer readable medium comprising a netlist representative of the circuit of claim 1.  (Young, Column 3, Paragraph 4, Line 18 “The system receives sets of weight inputs (step 102) and sets of activation inputs (step 104) for the given layer.  The sets of weight inputs and the sets of activation inputs can be received from dynamic memory…” And, Column 3, Paragraph 7, Line 48 “The system 200 includes a host interface 202.  The host interface 202 can receive instructions that include configuration information for a neural network computation.  The configuration information can include at least one or more of the following: how many layers should be processed, corresponding sets of weight inputs for each layer of the layer, an initial set of activation inputs, i.e., the input to the neural network from which the inference is to be computed, corresponding input and output sizes of each layer, a stride value for the neural network computation, and a type of layer to be processed, e.g., a convolutional layer or a fully connected layer.” In other words, dynamic memory is a non-transitory computer readable medium, and configuration information is a netlist representative of the circuit of claim 1.)
Regarding claim 13,
	The combination of Young and Kim teach a method for performing convolution neural network computations for a neural network, the method comprising: (Young, Column 9, Paragraph 3, Line 16 “Although this method has been described to be implemented on a circuit processing a neural network, this method can also be implemented on a processor, e.g., a In other words, method has been described to be implemented on a circuit processing a neural network is method for performing convolution neural network computation for a neural network.)
	loading input feature vectors into a transposing buffer along a first dimension of the transposing buffer; loading kernel weight vectors along a first dimension of a weight buffer; (Young, Column 5, paragraph 1, Line 1 “The value loaders 302 can receive the activation inputs from a unified buffer, e.g., the unified buffer 208 of FIG. 2.”  In other words, value loader 302 is transposing buffer 104. And, Column 5, Paragraph 2, line 12 “The weight fetcher interface 308 can receive the weight input from a memory unit, e.g., the dynamic memory 210 of FIG. 2.” In other words, weight fetcher interface is weight buffer.)
	for each of a plurality of processing cycles: outputting kernel component vectors from a second dimension of the weight buffer to a first dimension of a systolic array, where the second dimension is perpendicular to the first dimension; (Young, Column 5, Paragraph 2, line 12 “The weight fetcher interface 308 can send a corresponding weight input to a distinct top-most cell of the array 306.”  In other words, weight fetcher interface 308 can output the corresponding weight input, and array 306 is systolic array.  From FIG.3, Value Loaders (302) feed input feature vectors to the right along the first dimension. Weight Fetcher Interface feeds weight inputs downward along the second dimension. Accumulated output is shifted downward along the second dimension to Accumulators.)
	outputting feature component vectors from a second dimension of the transposing buffer to a second dimension of the systolic array, where the Docket No: P04768US.family3Application No.: 15/945,952PATENT second dimension is perpendicular to the first dimension and where the first dimension is perpendicular to the second dimension; and (Young, Column 4, Paragraph 7, Line 62 “In the illustrated example, value loaders 302 send activation inputs to rows of the array 306 and a weight fetcher interface 308 sends weight inputs to columns of the array 306.” In other words, value loaders 302 send activation inputs to rows of the array 306 is output feature component vectors along a second dimension.)
	in each cell of the systolic array, accumulating a product of a feature component and a kernel component; and (Young, Column 5, Paragraph 4, Line 28 “On each clock cycle, each cell can process a given weight input and a given activation input to generate an accumulated output.” In other words, each cell is each cell of the systolic array, can process is accumulating a product, activation input is feature component, and weight input is kernel component.)
	outputting accumulated products of the cells of the systolic array to an output layer of the neural network. (Young, Column 5, Paragraph 5, Line 34 “The accumulated output can be passed along the same column as the weight input, e.g., towards the bottom of the column in the array 306.” And, Line 44 “The accumulator units 310 can accumulate each accumulated output to generate a final accumulated value.  The final accumulated value can be transferred to a vector computation unit.” And, Column 4, Paragraph 5, Line 43 “For example, the vector computation unit 214 can apply a non-linear function to outputs of the matrix computation unit, e.g., a vector of accumulated values, to generate activated values.” In other words, accumulated output is accumulated products, and vector computation unit is output layer.)
Regarding claim 14,
	The combination of Young and Kim teach the method of claim 13, further comprising, 
for each of the plurality of processing cycles: passing the kernel weight vectors to neighboring cells in the second dimension of the systolic array; and passing the feature component vectors to neighboring cells in the first dimension of the systolic array. (Young, Column 5, Paragraph 3, Line 23 “For example, over one clock cycle, the activation input at cell 314 can shift to an activation register in cell 316, which is to the right of cell 314. Similarly, the weight input at cell 316 can shift to a weight register at cell 318, which is below cell 314.” In other words, one clock cycle is for each of the plurality of processing cycles, and weight input at cell 316 can shift to a weight register input at cell 318 is passing the kernel weight vectors to neighboring cells in the second dimension of the systolic array.)
Regarding claim 15,
	The combination of Young and Kim teach the method of claim 13, 
	further comprising, for each of the plurality of processing cycles: broadcasting the kernel weight vectors to cells in the second dimension of the systolic array; and (Kim, Page 3, Column 1, Paragraph 1, Step 2. “Activations and weights are broadcast from on-chip SRAM to PEs.”  In other words, weights are broadcast is broadcasting the kernel weight vectors.)
	broadcasting the feature component vectors to cells in the first dimension of the systolic array.  (Kim, Page 3, Column 1, Paragraph 1, Step 2. “Activations and weights are broadcast from on-chip SRAM to PEs.”  In other words, activations are broadcast is broadcasting the feature component vectors.)
Regarding claim 16,
	The combination of Young and Kim teach the method of claim 13, 
where loading input feature vectors into the transposing buffer along a first dimension of the transposing buffer comprises: shifting data stored in the transposing buffer in the second dimension; and (Young, Column 5, Paragraph 1, Line 5 “For example, value loader 312 can send an activation input to cell 314.  The value loader can also send the activation input to an adjacent value loader, and the activation input can be used at another left-most cell of the array 306.  This allows activation inputs to be shifted for use in another particular cell of the array 306.”  In other words, value loader 312 is transposing buffer, value loader can also send the activation input to an adjacent value loader is shifting data stored in the transposing buffer in the second dimension.)
	loading an input feature vector along an edge of the transposing buffer in the first dimension.  (Young, Column 5, Paragraph 1, Line 3 “Each value loader can send a corresponding activation input to a distinct left-most cell of the array 306.  The left-most cell can be a cell along a left-most column of the array 306.” In other words, send a corresponding activation input to a distinct left-most cell is loading an input feature vector along an edge of the transposing buffer in the first dimension.)
Regarding claim 17,
The combination of Young and Kim teach the method of claim 13, 
	where outputting feature component vectors from the second dimension of the transposing buffer to the second dimension of the systolic array comprises: shifting data stored in the transposing buffer in the first dimension; andDocket No: P04768US.family4 outputting a feature component vector along an edge of the transposing buffer in the second dimension.  (Young, Column 5, Paragraph 1, Line 4 “The left-most cell can be a cell along a left-most column of the array 306.  In other words, shift one cell in dimension 1 (across a row) is shifting data stored in the transposing buffer in the first dimension, output can be passed along the same column (second dimension) is outputting a feature component vector along an edge of the transposing buffer in the second dimension.)
Regarding claim 18,
	The combination of Young and Kim teach the method of claim 13, 
	where a kernel weight vector is applied to a patch of pixels in an image, and where an input feature vector comprising color components of pixels in the patch: and a feature component vector comprises a color component of a corresponding pixel in each of a plurality of patches. (Examiner notes from the Specification of the instant application paragraph [0020], line 3 “The transposing buffer is two-dimensional buffer that receives actuation feature vectors (patch values) along a first dimension and outputs feature component vectors along a second dimension to systolic array 106.” And, paragraph [0022], Line 3, “actuation features comprise the image data”, and Line 3, “each pixel in the image comprises one or more values.  For example, an NxM image A may be written as an array of pixel feature vectors.” Young, Column 8, Paragraph 4, Line 30 “For each batch, the circuit can process the 8 distinct images in the batch by reusing a particular set of weight inputs for the layer.  The circuit can then either (1) process one or more batches at a subsequent layer or (2) process another In other words, weight inputs is kernel weight vector, image is patch of pixels in an image.)
Regarding claim 19,
	The combination of Young and Kim teach the method of claim 13, 
	where outputting accumulated sum of products of the cells of the systolic array to an output layer of the neural network comprises passing accumulated sum of products between neighboring cells of the systolic array to an edge of the systolic array. (Young, Column 5, Paragraph 4, Line 28 “On each clock cycle, each cell can process a given weight input and a given activation input to generate an accumulated output.  The accumulated output can also be passed to an adjacent cell along the same dimension as the given weight input.” And, Column 5, Paragraph 5, Line 34 “The accumulated output can be passed along the same column as the weight input, e.g., towards the bottom of the column in the array 306.” In other words, the accumulated output can also be passed to an adjacent cell is passing accumulated sum of products between neighboring cells of the systolic array, and accumulated output can be passed along the same column as the weight input, e.g., towards the bottom of the column in the array 306 is to an edge of the systolic array.)
Claims 8 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Young (US 9842293 B2, Herein Young), Kim et al (A Novel Zero Weight/Activation-Aware Hardware Architecture of Convolutional Neural Network, herein Kim), and Kim et al (ZeNA: Zero-Aware Neural Network Accelerator, herein Kim-A).
Regarding claim 8,
	The combination of Young and Kim teach the circuit of claim 1, 
where the values of the feature component vectors or the kernel component vectors are tagged with validity bits, indicative of data validity, and where an accumulator of a MAC processing cell is set to zero when data tagged as invalid is received.  
	Kim-A teaches where the values of the feature component vectors or the kernel component vectors are tagged with validity bits, indicative of data validity, and where an accumulator of a MAC processing cell is set to zero when data tagged as invalid is received. (Kim-A, Page 3, Column 1, Paragraph 2, Line 1 “In order to skip computation associated with zero weights and activations, each PE receives a zero bit-vector (i.e., a bit-vector that stores the information about whether each weight/activation is zero or not).”  In other words, a zero bit-vector is vectors are tagged with validity bits, indicative of data validity, and where an accumulator of a MAC processing cell is set to zero when data tagged as invalid is received.)
	Both Kim-A and the combination of Young and Kim are directed to hardware implementations of neural networks and improving the performance and energy consumptions of neural networks. In view of the teaching of Kim-A, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine Kim-A into the teaching of Young and Kim.  This would result in being able to tag valid data.  By identifying data that is not valid, i.e. missing the tag, and setting the respective accumulator cell to zero, inference speed can be improved by not processing invalid data.
	One of ordinary skill in the art would be motivated to do this in order to speed up the inference of a convolutional neural network by only processing valid data.

Regarding claim 9,
	The combination of Young, Kim, and Kim-A teach the circuit of claim 1, 
	further comprising a control line coupled to the MAC processing cells, where an accumulator of a MAC processing cell is set to zero in response to a signal on the control line.  (Kim-A, Fig. 2, Page 3, Column 1, Paragraph 2, Line 1 “In order to skip computation associated with zero weights and activations, each PE receives a zero bit-vector (i.e., a bit-vector that stores the information about whether each weight/activation is zero or not). Bitvec module reads activations and weights from the on-chip SRAM and generates zero bit-vectors from them.  On-the-fly zero bit-vector generation can save on-chip SRAM resource otherwise required for zero bit-vector storage.”  In other words, Bitvec module reads activations and weights from the on-chip SRAM and generates zero bit-vectors from them is accumulator of a MAC processing cell is set to zero.)
Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to BART RYLANDER whose telephone number is (571)272-8359.  The examiner can normally be reached on Monday - Thursday 8:00 to 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/B.I.R./Examiner, Art Unit 2124                                                                                                                                                                                                        



/VINCENT GONZALES/Primary Examiner, Art Unit 2124