DETAILED ACTION
1.	This office action is in response to the Application No. 16696717 filed on 11/26/2019. Claims 1-17 are presented for examination and are currently pending.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
19.	This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are: “a zero-value filter”, “a multiplier”, “a feature map extractor” in claim 1.
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. Also these limitations use generic place holders modified by functional language and the area not modified by sufficient structure.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

3.	Claims 1-17 are rejected under 35 U.S.C 101 because the claimed invention is directed towards an abstract idea without significantly more.

	Step 1
	Independent claim 1 is directed to an apparatus, and falls into one of the four statutory categories.
	Step 2A, Prong 1
	Claim 1 recites the following abstract ideas:
	to filter a zero (0) value by applying a weight to an input feature, the input feature including a plurality of data elements (mental process directed towards a mathematical concept) and 
	generate compressed packet data by matching index information including relative coordinates and group boundary information with the data elements of the input feature (mental process directed towards comparing input data values)
	to produce result data by performing a multiplication operation on the input feature and the weight of the compressed packet data; (mental process directed towards a mathematical concept) and
	 to perform an addition operation between the result data based on the relative coordinates and the group boundary information (mental process directed towards a mathematical concept) and 
	generate an output feature map by rearranging result values of the addition operation in an original input feature form. (mental process directed towards changing the position of data values)
Step 2A, Prong 2
	Claim 1 recites the following additional elements:
	a zero-value filter (The filter is directed towards a conventional routine to extract information from input data and does not integrate the abstract idea into practical application)
	a multiplier (The multiplier is directed towards a conventional routine to perform mathematical calculations. This does not integrate the abstract idea into practical application)
	a feature map extractor (The feature map extractor is directed towards a conventional routine to extract information from input data and does not integrate the abstract idea into practical application)
	Step 2B
	Claim 1 recites the following additional elements:
	a zero-value filter (The filter is directed towards a conventional routine to extract information from input data and does not amount to significantly more than judicial exception. See MPEP 2106.05(f))
	a multiplier (The multiplier is directed towards a conventional routine to perform mathematical calculations and does not amount to significantly more than judicial exception. See MPEP 2106.05(f))
	a feature map extractor (The feature map extractor is directed towards a conventional routine to extract information from input data. This is directed to high level recitation of generic computer software and does not amount to significantly more than judicial exception. See MPEP 2106.05(f))

Dependent claim 2 is directed to an apparatus, and falls into one of the four statutory categories.  
	Claim 2 recites the following abstract ideas:
	to change the output feature map to nonlinear values by applying an activation function to the output feature map (mental process directed towards a mathematical concept)
	generate a final output feature map by performing a pooling process (mental process directed towards a mathematical concept) and
	
	Claim 2 recites the following additional limitations:
	output feature map generator (This limitation is directed towards a conventional routine of generating data and does not integrate the abstract idea into a practical application)
	a first memory, (This limitation is directed towards a generic computer component of storing information and does not integrate the abstract idea into a practical application)
	a second memory, (This limitation is directed towards a generic computer component of storing information and does not integrate the abstract idea into a practical application)
	the zero-value filter (The filter is directed towards a conventional routine to extract information from input data and does not integrate the abstract idea into practical application)
transmit the final output feature map (data transmission well‐understood, routine, and conventional functions)

	Claim 2 recites the following additional limitations:
	output feature map generator (This limitation is directed towards a conventional routine of generating data network and does not amount to significantly more than judicial exception. See MPEP 2106.05(f)
	a first memory, (This limitation is directed towards a generic computer component of storing information and does not amount to significantly more than judicial exception. See MPEP 2106.05(f))
	a second memory, (This limitation is directed towards a generic computer component of storing information and does not amount to significantly more than judicial exception. See MPEP 2106.05(f)) and 
	the zero-value filter. (The filter is directed towards a conventional routine to extract information from input data and does not amount to significantly more than judicial exception. See MPEP 2106.05(f))
transmit the final output feature map (data transmission well‐understood, routine, and conventional functions, See MPEP 2106.05(d)(II), first list, example (i))

5.	Dependent claim 3 is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 3 recite the following abstract ideas:
	 performs the zero-value filtering using zero-value positions of the input feature, zero-value positions of the weight, and a stride value (mental process that is directed towards information retrieval)
	Claim 3 recites the following additional elements:
	zero-value filter (The filter is directed towards a conventional routine to extract information from input data and does not integrate the abstract idea into practical application)
	Claim 3 recites the following additional elements:
	zero-value filter (The filter is directed towards a conventional routine to extract information from input data and does not amount to significantly more than judicial exception. See MPEP 2106.05(f))

6.	Dependent claim 4, is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 4 recites the following abstract ideas:
	groups the data elements of the input feature according to a preset criterion, (mental process directed towards grouping data)	
	generates the relative coordinates between a plurality of groups, (mental process directed towards extracting information) and 
	matches the relative coordinates with data elements of each group.(mental process directed towards comparing information)
	Claim 4 recites the following additional elements:
	zero-value filter (The filter is directed towards a conventional routine to extract information from input data and does not integrate the abstract idea into practical application)
	Claim 4 recites the following additional elements:
	zero-value filter (The filter is directed towards a conventional routine to extract information from input data and does not amount to significantly more than judicial exception. See MPEP 2106.05(f))

6.	Dependent claim 5, is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 5 recites the following abstract ideas:
	wherein the group boundary information is 1-bit information for dividing the plurality of groups. (mental process directed towards a mathematical concept)
	Claim 5 do not recite any additional element

7.	Dependent claim 6 is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 6 recites the following abstract ideas:
	converts the input feature and the weight to a one-dimensional (1D) vector, (mental process directed towards a mathematical concept)
	filters non-zero value positions of the input feature and the weight by performing a bitwise OR operation on the input feature and the weight, (mental process directed towards a mathematical concept) and 
	produces non-zero position values according to weight positions for target boundaries by performing a bitwise AND operation on filtered non-zero position values of the input feature and weight. (mental process directed towards a mathematical concept)
	Claim 6 recites the following additional elements:
	zero-value filter (The filter is directed towards a conventional routine to extract information from input data and does not integrate the abstract idea into practical application)
	Claim 6 recites the following additional elements:
	zero-value filter (The filter is directed towards a conventional routine to extract information from input data and does not amount to significantly more than judicial exception. See MPEP 2106.05(f))

8.	Dependent claim 7 is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 7 recites the following abstract ideas:
	produces integrated boundary information by performing a bitwise OR operation on the non-zero position values for the target boundaries (mental process directed towards a mathematical concept
	Claim 7 recites the following additional elements:
	zero-value filter (The filter is directed towards a conventional routine to extract information from input data and does not integrate the abstract idea into practical application)
	Claim 7 recites the following additional elements:
	zero-value filter (The filter is directed towards a conventional routine to extract information from input data and does not amount to significantly more than judicial exception. See MPEP 2106.05(f))

9.	Dependent claim 8 is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 8 recite the following abstract ideas:
	changes the target boundaries on which the bitwise OR operation is to be performed according to a stride value when producing the integrated boundary information. (mental process directed towards changing data)
	Claim 8 recites the following additional elements:
	the zero-value filter (The filter is directed towards a conventional routine to extract information from input data and does not integrate the abstract idea into practical application)
	Claim 8 recites the following additional elements:
	a zero-value filter (The filter is directed towards a conventional routine to extract information from input data and does not amount to significantly more than judicial exception. See MPEP 2106.05(f))

10.	Dependent claim 9 is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 9 recites the following abstract ideas:
	each target boundary corresponds to a respective position of a sliding window by which the weight as converted to the 1D vector is applied to the input feature as converted to the 1D vector. (mental process directed towards mathematical concept)
	Claim 9 do not recite any additional element.

11.	Dependent claim 10 is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 9 recites the following abstract ideas:
	wherein the multiplier skips the multiplication operation for the zero value-filtered compressed packet data with reference to the index information when performing the multiplication operation. (mental process directed towards mathematical concept)
	Claim 10 do not recite any additional element.


12.	Dependent claim 11 is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 11 do not recite any abstract idea.
	Claim 11 recites the following additional elements
	a first memory configured to store the input feature and the weight; (This limitation is directed towards a generic computer component of storing information and does not integrate the abstract idea into a practical application) and
	a second memory configured to store the compressed packet data including the index information transferred from the zero-value filter. (This limitation is directed towards a generic computer component of storing information and does not integrate the abstract idea into a practical application)
	Claim 11 recites the following additional elements
	a first memory configured to store the input feature and the weight; (This limitation is directed towards a generic computer component of storing information and does not amount to significantly more than judicial exception. See MPEP 2106.05(d)(II), first list, example (iv)) and
	a second memory configured to store the compressed packet data including the index information transferred from the zero-value filter. (This limitation is directed towards a generic computer component of storing information and does not amount to significantly more than judicial exception. See MPEP 2106.05(d)(II), first list, example (iv))

13.	Independent claim 12 is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 12 recites the following abstract ideas:
	filtering a zero (0) value by applying the weight to the input feature (mental process directed towards a mathematical concept) and 
	generating compressed packet data by matching index information including relative coordinates and group boundary information for the data elements of the input feature; (mental process directed towards comparing input data values)
	producing result data by performing a multiplication operation on the input feature and the weight of the compressed packet data; (mental process directed towards a mathematical concept
	performing an addition operation between multiplied result data based on the relative coordinates and the group boundary information of the result data and generating an output feature map by rearranging result values of the addition operation in an original input feature form; (mental process directed towards a mathematical concept) and
	changing the output feature map to nonlinear values by applying an activation function to the output feature map and generating a final output feature map by performing a pooling process. (mental process directed towards a mathematical concept)
	Claim 12 recites the following additional elements
	receiving an input feature and a weight, the input feature including a plurality of data elements; (This limitation is directed towards information retrieval and does not integrate the abstract idea into practical application)
	Claim 12 recites the following additional elements
	receiving an input feature and a weight, the input feature including a plurality of data elements; (This limitation is directed towards information retrieval and does not amount to significantly more than judicial exception.  See MPEP 2106.05(d)(II), first list, example (iv))

14.	Dependent claim 13 is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 13 recites the following abstract idea:
	wherein the generating of the compressed packet data includes performing the zero-value filtering using zero-value positions of the input feature, zero-value positions of the weight, and a stride value ((mental process directed towards comparing input data values)
	Claim 13 do not recite additional elements

15.	Dependent claim 14 is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 14 recites the following abstract idea:
	wherein the generating of the compressed packet data includes grouping the data elements of the input feature according to a preset criterion, (mental process directed towards grouping data)
	 generating the relative coordinates between a plurality of groups, (mental process directed towards extracting information) and 
	matching the relative coordinates with data elements of each group (mental process directed towards comparing information)
	Claim 14 do not recite additional elements

16.	Dependent claim 15 is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 15 recites the following abstract idea:
	converting the input feature and the weight in a one-dimensional (1D) vector (mental process directed towards a mathematical concept) and 	
	filtering non-zero value positions of the input feature and the weight by performing a bitwise OR operation on the input feature and the weight; (mental process directed towards a mathematical concept)
	producing non-zero position values according to weight positions for target boundaries by performing a bitwise AND operation on filtered non-zero position values of the input feature and the weight; (mental process directed towards a mathematical concept) and
	producing integrated boundary information by performing a bitwise OR operation on the non-zero position values for the target boundaries. (mental process directed towards a mathematical concept)
	Claim 15 do not recite any additional elements

17.	Dependent claim 16 is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 16 recites the following abstract idea:
	wherein producing of the integrated boundary information includes changing the target boundaries on which the bitwise OR operation is to be performed according to a stride value. (mental process directed towards changing data)
	Claim 16 do not recite any additional elements

18.	Dependent claim 17 is directed to an apparatus, and falls into one of the four statutory categories.
	Claim 17 recites the following abstract idea:
	wherein each target boundary corresponds to a respective position of a sliding window by which the weight as converted to the 1D vector is applied to the input feature as converted to the 1D vector (mental process directed towards mathematical concept)
	Claim 17 do not recite any additional elements


			Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



20.	Claims 1, 2, 4, 11, 12 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Dally et al. (US20180046900) in view of Ko et al. (US11250326 filed 12/06/2018)

	Regarding claim 1, 	Dally teaches a neural network accelerating apparatus (The SCNN accelerator architecture couples an algorithmic dataflow that eliminates all multiplications with a zero operand while employing a compressed representation of both weights and activations through almost the entire computation [0036]) comprising:
	a zero-value filter (FIG. 3B illustrates two 3×3 weight kernels and positions, in accordance with one embodiment. A first set of weights for k=1 includes the non-zero elements a, b, and c, and a second set of weights for k=2 includes the non-zero elements d, e, and f [0089]. Examiner notes the remaining weights are zero elements, Fig. 3b) configured 
	to filter a zero (0) value by applying a weight to an input feature, (The values of these filters are the weights that are trained using a training set for the network [0051]; the encode sparse data instruction is executed iteratively to remove all of the zeros from the operand and generate the vector of non-zero elements A and a vector of encoded indices AX [0187])
	the input feature including a plurality of data elements, and generate compressed packet data by matching index information including relative coordinates and group boundary information with the data elements of the input feature; (The specific format used to generate the compressed-sparse encoded data is orthogonal to the sparse architecture itself. What is key is that decoding a sparse format ultimately yields a non-zero data value and a position indicating the coordinates of the value in the weight or input activation matrices. In one embodiment, the position is defined by an index [0071]; The position portion of the compressed-sparse format includes zero-counts that are decoded into (r,s,k) for each weight and (x,y) for each input activation and then added to produce an (x,y,k) position for the corresponding product [0129];
	a multiplier configured to produce result data by performing a multiplication operation on the input feature and the weight of the compressed packet data; (The vectors are distributed into the F×I multiplier array 325 that computes a form of the cartesian product of the vectors [0081]; As previously explained, the F×I multiplier array 325 takes F weights and I input activations and produces P=F*I products [0129]) and
	a feature map extractor configured to perform an addition operation between the result data based on the relative coordinates and the group boundary information and generate an output feature map (Summing the products in the adders completes the convolution operation and generates the output activations (as output feature map) [0050]; Note that the input activation coordinate system is tied to the halo such that, for a 3×3 convolution kernel, the current input activations start at (1,1). Once the (r,s,k) positions of the weights are computed and the (x,y) positions of the input activations are computed by the destination calculation unit 330, the r and x coordinates are summed and the s and y coordinates are summed by the destination calculation unit 330 to compute the output activation positions  in (x,y,k) form [0139]. Examiner notes calculation unit 330 as feature map extractor and output activation as output feature map).
	Dally does not explicitly teach generate an output feature map by rearranging result values of the addition operation in an original input feature form.
	Ko teaches generate an output feature map by rearranging result values of the addition operation in an original input feature form (some embodiments also reorder the filters (such that the outputs will be reordered), col 45, lines 19-37)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the device of Dally to incorporate the teachings of Ko for the benefit of computing neural network operations in an efficient, low-power manner (Ko, col 2, lines 10-11)

	Regarding claim 2, Modified Dally teaches the neural network accelerating apparatus of claim 1, Dally teaches further comprising an output feature map generator configured to change the output feature map to nonlinear values by applying an activation function to the output feature map, generate a final output feature map by performing a pooling process, and transmit the final output feature map to any one of a first memory, a second memory, and the zero-value filter. (…the post-processing unit 345 performs the following tasks: (1) exchange partial sums with neighboring PEs 210 for the halo regions at the boundary of the PE's 210 output activations, (2) apply the non-linear activation (e.g. ReLU), pooling, and dropout functions, and (3) compress the output activations into the compressed-sparse form and write the compressed-sparse output activations into the output activations buffer 350 [0082]. Examiner notes post-processing unit 345 as feature map extractor and output activation buffer 350 as first memory).

	Regarding claim 4, 	Modified Dally teaches the neural network accelerating apparatus of claim 1, Dally teaches wherein the zero-value filter groups the data elements of the input feature according to a preset criterion, generates the relative coordinates between a plurality of groups, and matches the relative coordinates with data elements of each group. (coding scheme, Figure 4C [0018]; for grouping input activation zero-count values [0024]; Finally, a batch of length N of groups of C channels of input activation planes can be applied to the same volume of filter weights [0055]; For example, a compressed-space encoding of the data shown in FIG. 3B is (a, b, c, d, e, f) and (2, 0, 3, 4, 1, 1) representing a data vector and a corresponding index vector, where each element in the index vector is a number of zeros preceding the corresponding non-zero element [0117]; Examiner notes ((x,y) input activation coordinates [0022] and (r,s) weight coordinates [0021] are matched in Figure 4C. The Examiner notes the coding scheme is the preset criterion)

	Regarding claim 11, Modified Dally teaches the neural network accelerating apparatus of claim 1, Dally teaches further comprising: a first memory configured to store the input feature and the weight; (…the post-processing unit 345 performs the following tasks: (1) exchange partial sums with neighboring PEs 210 for the halo regions at the boundary of the PE's 210 output activations, (2) apply the non-linear activation (e.g. ReLU), pooling, and dropout functions, and (3) compress the output activations into the compressed-sparse form and write the compressed-sparse output activations into the output activations buffer 350 [0082]. Examiner notes post-processing unit 345 as feature map extractor and output activation buffer 350 as first memory). and
	a second memory configured to store the compressed packet data including the index information transferred from the zero-value filter. (What is key is that decoding a sparse format ultimately yields a non-zero data value and a position indicating the coordinates of the value in the weight or input activation matrices. In one embodiment, the position is defined by an index or an address, such as an address corresponding to one of the accumulation buffers 250 or adder units 255 [0071]; write the output positions associated with the compressed-sparse output activations into the indices buffer 355 [0082])

	Regarding claim 12, Dally teaches an operating method of a neural network accelerating apparatus, the operating method (The SCNN accelerator architecture couples an algorithmic dataflow that eliminates all multiplications with a zero operand while employing a compressed representation of both weights and activations through almost the entire computation [0036]) comprising:
	receiving an input feature and a weight, the input feature including a plurality of data elements; (The memory interface 205 reads weight and activation data from a memory coupled to the SCNN 200 the memory interface 205 may also write weight and/or activation data from the SCNN 200 to the memory. …The weight and/or activation data may be stored in the memory in a compact format or an expanded format [0045]; The layer sequencer 215 controls the reading of the memory to obtain the compact input activations and compact weights. The compact input activations and compact weights may be stored within the memory interface 205 before being transmitted to the PEs 210 [0046])
	filtering a zero (0) value by applying the weight to the input feature (The values of these filters are the weights that are trained using a training set for the network [0051]; In one embodiment, any weight with an absolute value that is close to zero (e.g. below a defined threshold) is set to zero. The pruning process has the effect of removing weights from the filters [0052]; At step 115, each one of the non-zero weight values is multiplied with every one of the non-zero input activation values, within a multiplier array, to produce a third vector of products [0041])
	and generating compressed packet data by matching index information including relative coordinates and group boundary information for the data elements of the input feature; (The specific format used to generate the compressed-sparse encoded data is orthogonal to the sparse architecture itself. What is key is that decoding a sparse format ultimately yields a non-zero data value and a position indicating the coordinates of the value in the weight or input activation matrices. In one embodiment, the position is defined by an index [0071]; The position portion of the compressed-sparse format includes zero-counts that are decoded into (r,s,k) for each weight and (x,y) for each input activation and then added to produce an (x,y,k) position for the corresponding product [0129]; Note that xmax _halo and ymax _halo refer to the dimensions of the halo and (x,y,k) is the output activation position [0140]; exchange partial sums with neighboring PEs 210 for the halo regions at the boundary of the PE's 210 output activations [0082])
	producing result data by performing a multiplication operation on the input feature and the weight of the compressed packet data; (The vectors are distributed into the F×I multiplier array 325 that computes a form of the cartesian product of the vectors [0081]; As previously explained, the F×I multiplier array 325 takes F weights and I input activations and produces P=F*I products [0129])
	performing an addition operation between multiplied result data based on the relative coordinates and the group boundary information of the result data and generating an output feature map (Summing the products in the adders completes the convolution operation and generates the output activations (as output feature map) [0050]; Note that the input activation coordinate system is tied to the halo such that, for a 3×3 convolution kernel, the current input activations start at (1,1). Once the (r,s,k) positions of the weights are computed and the (x,y) positions of the input activations are computed by the destination calculation unit 330, the r and x coordinates are summed and the s and y coordinates are summed by the destination calculation unit 330 to compute the output activation positions  in (x,y,k) form [0139]. Examiner notes calculation unit 330 as feature map extractor and output activation as output feature map).and
	changing the output feature map to nonlinear values by applying an activation function to the output feature map and generating a final output feature map by performing a pooling process. (…the post-processing unit 345 performs the following tasks: (1) exchange partial sums with neighboring PEs 210 for the halo regions at the boundary of the PE's 210 output activations, (2) apply the non-linear activation (e.g. ReLU), pooling, and dropout functions, and (3) compress the output activations into the compressed-sparse form and write the compressed-sparse output activations into the output activations buffer 350 [0082]. Examiner notes post-processing unit 345 as feature map extractor and output activation buffer 350 as first memory).
	Dally does not explicitly teach generate an output feature map by rearranging result values of the addition operation in an original input feature form.
	Ko teaches generate an output feature map by rearranging result values of the addition operation in an original input feature form (some embodiments also reorder the filters (such that the outputs will be reordered), col 45, lines 19-37)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the device of Dally to incorporate the teachings of Ko for the benefit of computing neural network operations in an efficient, low-power manner (Ko, col 2, lines 10-11)

	Regarding claim 14, Modified Dally teaches the method of claim 12, Dally teaches wherein the generating of the compressed packet data includes grouping the data elements of the input feature according to a preset criterion, generating the relative coordinates between a plurality of groups, and matching the relative coordinates with data elements of each group. (Finally, a batch of length N of groups of C channels of input activation planes can be applied to the same volume of filter weights [055]; The position portion of the compressed-sparse format includes zero-counts that are decoded into (r,s,k) for each weight and (x,y) for each input activation and then added to produce an (x,y,k) position for the corresponding product. [0129])

21.	Claims 3, 10, 13 and are rejected under 35 U.S.C. 103 as being unpatentable over Dally et al. (US20180046900) in view of Ko et al. (US11250326 filed 12/06/2018) and further in view of Albericio et al. ("Cnvlutin: Ineffectual-neuron-free deep neural network computing." ACM SIGARCH Computer Architecture News 44.3 (2016): 1-13.)

	Regarding claim 3, Modified Dally teaches the neural network accelerating apparatus of claim 1, Dally teaches wherein the zero-value filter performs the zero-value filtering using zero-value positions of the input feature, zero-value positions of the weight, (An index vector may be extracted from the compressed-sparse encoded data, where the index vector is a sequence of zero-counts (the number of zeros between each non-zero element). For example, a compressed-space encoding of the data shown in FIG. 3B is (a, b, c, d, e, f) and (2, 0, 3, 4, 1, 1) representing a data vector and a corresponding index vector, where each element in the index vector is a number of zeros preceding the corresponding non-zero element [0117]; In one embodiment, each non-zero weight and activation value is represented by a (value, position) pair [0036]; Each r,s,k position for a weight or (x,y) position for an input activation may be calculated using the position coordinates of the previous weight or input activation, respectively. The weight position calculation is shown in TABLE 11, where “value” is the zero-count [0124])
	Modified Dally does not explicitly teach the zero-value filter performs the zero-value filtering using a stride value
	 Albericio teaches the zero-value filter performs the zero-value filtering using a stride value (The filters are applied repeatedly over different windows moving along the X and Y dimensions using a constant stride S to produce all the output neurons. Accordingly, the output neuron array dimensions are Ox = (Ix −Fx)/S+1, and Oy = (Iy −Fy)/S+1. Figure 2 shows a example with a 3×3×2 input neuron array, a single 2 × 2 × 2 filter and unit stride producing an output neuron array of 2×2×1. When an input neuron is zero the corresponding multiplication and addition can be eliminated to save time and energy without altering the output value, pg. 3, right col, last para.; Pruning is a common computation reduction technique in neural networks that removes ineffectual synapses or neurons, pg. 11, left col, second to the last para.; The leftmost point for each network corresponds to CNV in Figure 9 where only zero-valued neurons were removed, pg. 11, right col, second to the last para.)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the device of Modified Dally to incorporate the teachings of Albericio for the benefit of improving performance in image classification by 1.37x on average without any loss in accuracy by removing zero-valued operand multiplications (Albericio, abstract)

	Regarding claim 10, Modified Dally teaches the neural network accelerating apparatus of claim 1, Modified Dally does not explicitly teach wherein the multiplier skips the multiplication operation for the zero value-filtered compressed packet data with reference to the index information when performing the multiplication operation.
	Albericio teaches wherein the multiplier skips the multiplication operation for the zero value-filtered compressed packet data with reference to the index information when performing the multiplication operation. (Figure 7 shows the Zero-Free Neuron Array format (ZFNAf) that enables CNV to avoid computations with zero-valued neurons. As Section III-C explained, only the non-zero neurons are stored, each along with an offset indicating its original position, pg. 7, left col, last para.; CNV allows direct indexing at a finer granularity sacrificing any memory footprint savings. Specifically, ZFNAf encodes neurons as (value, offset) pairs in groups called bricks. Each brick corresponds to a fetch block of the DaDianNao design, that is an aligned, continuous along the input features dimension i group of 16 neurons, i.e., they all have the same x and y coordinates. Bricks are stored starting at the position their first neuron would have been stored in the conventional 3D array format adjusted to account for the offset fields and are zero padded, pg. 7, right col, first para.)
	The same motivation to combine dependent claim 3 applies here.

	Regarding claim 13, Modified Dally teaches the method of claim 12, Dally teaches wherein the generating of the compressed packet data includes performing the zero-value filtering using zero-value positions of the input feature, zero-value positions of the weight, (An index vector may be extracted from the compressed-sparse encoded data, where the index vector is a sequence of zero-counts (the number of zeros between each non-zero element). For example, a compressed-space encoding of the data shown in FIG. 3B is (a, b, c, d, e, f) and (2, 0, 3, 4, 1, 1) representing a data vector and a corresponding index vector, where each element in the index vector is a number of zeros preceding the corresponding non-zero element [0117]; In one embodiment, each non-zero weight and activation value is represented by a (value, position) pair [0036]; Each r,s,k position for a weight or (x,y) position for an input activation may be calculated using the position coordinates of the previous weight or input activation, respectively. The weight position calculation is shown in TABLE 11, where “value” is the zero-count [0124]) and 
	Dally does not explicitly teach the zero-value filter performs the zero-value filtering using a stride value
	Albericio teaches the zero-value filter performs the zero-value filtering using a stride value (The filters are applied repeatedly over different windows moving along the X and Y dimensions using a constant stride S to produce all the output neurons. Accordingly, the output neuron array dimensions are Ox = (Ix −Fx)/S+1, and Oy = (Iy −Fy)/S+1. Figure 2 shows a example with a 3×3×2 input neuron array, a single 2 × 2 × 2 filter and unit stride producing an output neuron array of 2×2×1. When an input neuron is zero the corresponding multiplication and addition can be eliminated to save time and energy without altering the output value, pg. 3, right col, last para.; Pruning is a common computation reduction technique in neural networks that removes ineffectual synapses or neurons, pg. 11, left col, second to the last para.; The leftmost point for each network corresponds to CNV in Figure 9 where only zero-valued neurons were removed, pg. 11, right col, second to the last para.)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the device of Modified Dally to incorporate the teachings of Albericio for the benefit of improving performance in image classification by 1.37x on average without any loss in accuracy by removing zero-valued operand multiplications (Albericio, abstract)

22.	Claims 5-7 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Dally et al. (US20180046900) in view of Ko et al. (US11250326 filed 12/06/2018) and further in view of Chen et al (US20190115933)

	Regarding claim 5, Modified Dally teaches the neural network accelerating apparatus of claim 4, Modified Dally does not explicitly teach wherein the group boundary information is 1-bit information for dividing the plurality of groups.
	Chen teaches wherein the group boundary information is 1-bit information for dividing the plurality of groups (The parser 412 receives the 1D NZP-w 30, parses the NZP-w 30 to divide the NZP-w 30 into its bitmap header 31 and its payload 32 with five NZ elements (wnz(0)˜wnz(4)) [0033].; if an element in the vector X/W is equal to 0, its corresponding bit in the bitmap header 31 is set to 0; otherwise, if the element has a value not equal to 0, its corresponding bit in the bitmap header 31 is set to 1 [0030])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the device of Modified Dally to incorporate the teachings of Chen for the benefit of compressing the weight vector W and the data vector X into 1D NZP-w (none-zero packet for weight matrix W [0027]) 30 and 1D NZP-x 30 respectively. (Chen [0030])

	Regarding claim 6, Modified Dally teaches the neural network accelerating apparatus of claim 1, Dally teaches filters non-zero value positions of the input feature and the weight by performing a bitwise OR operation on the input feature and the weight, (A dense encoding of sparse weights and activations is used to reduce the bandwidth needed to transmit the weight and activation values from the memory to the SCNN 200, between different levels of the memory hierarchy, and between the different logic circuits within the SCNN 200. Input data, such as weights and activations with zeros can be represented in a compact form referred to as compressed-sparse format [0113]; The single-stage F*I arbitrated crossbar 335 …, multiplexer 366, and an OR-gate 370 [0097];  the F*I multiplier array 335 performs F×I multiplies (of values and positions) each processing cycle unless the weight buffer 305 goes empty or the F*I arbitrated crossbar 335 signals that it cannot accept inputs [0087]) and 
	Modified Dally does not explicitly teach wherein the zero-value filter converts the input feature and the weight to a one-dimensional (1D) vector; produces non-zero position values according to weight positions for target boundaries by performing a bitwise AND operation on filtered non-zero position values of the input feature and weight
	Chen teaches wherein the zero-value filter converts the input feature and the weight to a one-dimensional (1D) vector, (the weight vector W and the data vector X are respectively compressed into the 1D NZP-w 30 and the 1D NZP-x 30.[0030])
	produces non-zero position values according to weight positions for target boundaries by performing a bitwise AND operation on filtered non-zero position values of the input feature and weight (the AND gate array 425 performs a bitwise logical AND operation between the bitmap headers 31 of the NZP-w 30 and the NZP-x 30 in parallel to generate the output bitmap (i.e., o-bm) with two non-zero bits (i.e., bit 5 and bit 10) [0034])
	The same motivation to combine dependent claim 5 applies here.

	Regarding claim 7, Modified Dally teaches the neural network accelerating apparatus of claim 6, Dally teaches wherein the zero-value filter produces integrated boundary information by performing a bitwise OR operation on the non-zero position values for the target boundaries. (A dense encoding of sparse weights and activations is used to reduce the bandwidth needed to transmit the weight and activation values from the memory to the SCNN 200, between different levels of the memory hierarchy, and between the different logic circuits within the SCNN 200. Input data, such as weights and activations with zeros can be represented in a compact form referred to as compressed-sparse format [0113]; The single-stage F*I arbitrated crossbar 335 …, multiplexer 366, and an OR-gate 370 [0097];  the F*I multiplier array 335 performs F×I multiplies (of values and positions) each processing cycle unless the weight buffer 305 goes empty or the F*I arbitrated crossbar 335 signals that it cannot accept inputs [0087])

	Regarding claim 15, Modified Dally teaches the method of claim 12, Dally teaches wherein the generating of the compressed packet data includes: filtering non-zero value positions of the input feature and the weight by performing a bitwise OR operation on the input feature and the weight; (A dense encoding of sparse weights and activations is used to reduce the bandwidth needed to transmit the weight and activation values from the memory to the SCNN 200, between different levels of the memory hierarchy, and between the different logic circuits within the SCNN 200. Input data, such as weights and activations with zeros can be represented in a compact form referred to as compressed-sparse format [0113]; The single-stage F*I arbitrated crossbar 335 …, multiplexer 366, and an OR-gate 370 [0097];  the F*I multiplier array 335 performs F×I multiplies (of values and positions) each processing cycle unless the weight buffer 305 goes empty or the F*I arbitrated crossbar 335 signals that it cannot accept inputs [0087])
	producing integrated boundary information by performing a bitwise OR operation on the non-zero position values for the target boundaries (The single-stage F*I arbitrated crossbar 335 …, multiplexer 366, and an OR-gate 370 [0097];  the F*I multiplier array 335 performs F×I multiplies (of values and positions) each processing cycle unless the weight buffer 305 goes empty or the F*I arbitrated crossbar 335 signals that it cannot accept inputs [0087]; In one embodiment, the non-zero elements are multiplied within the F×I multiplier array 325 to produce result values that are products. At step 415, the corresponding multi-dimensional positions are processed in parallel to produce destination addresses for each result value in the plurality of result values. In one embodiment, the multi-dimensional positions are processed in the destination calculation unit 330 to produce a destination accumulator address associated with a location in the accumulator array 340 for each one of the result values [0116])
	Dally does not explicitly teach converting the input feature and the weight in a one-dimensional (1D) vector and producing non-zero position values according to weight positions for target boundaries by performing a bitwise AND operation on filtered non-zero position values of the input feature and the weight; and
	Chen teaches converting the input feature and the weight in a one-dimensional (1D) vector (the weight vector W and the data vector X are respectively compressed into the 1D NZP-w 30 and the 1D NZP-x 30.[0030]) and
	producing non-zero position values according to weight positions for target boundaries by performing a bitwise AND operation on filtered non-zero position values of the input feature and the weight (the AND gate array 425 performs a bitwise logical AND operation between the bitmap headers 31 of the NZP-w 30 and the NZP-x 30 in parallel to generate the output bitmap (i.e., o-bm) with two non-zero bits (i.e., bit 5 and bit 10) [0034])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the device of Modified Dally to incorporate the teachings of Chen for the benefit of compressing the weight vector W and the data vector X into 1D NZP-w (none-zero packet for weight matrix W [0027]) 30 and 1D NZP-x 30 respectively. (Chen [0030])

23.	Claims 8 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Dally et al. (US20180046900) in view of Ko et al. (US11250326 filed 12/06/2018) in view of Chen et al (US20190115933) and further in view of Chang et al. ("An energy-efficient FPGA-based deconvolutional neural networks accelerator for single image super-resolution." IEEE Transactions on Circuits and Systems for Video Technology 30.1 (2018): 281-295.)

	Regarding claim 8, Modified Dally teaches the neural network accelerating apparatus of claim 7, Modified Dally does not explicitly teach wherein the zero-value filter changes the target boundaries on which the bitwise OR operation is to be performed according to a stride value when producing the integrated boundary information.
	Chang teaches wherein the zero-value filter changes the target boundaries on which the bitwise OR operation is to be performed according to a stride value when producing the integrated boundary information. (we obtain the relative position (xr, yr) using (KD − S × xi, KD − S × yi). This is because input pixels are shifted by the stride S in the output feature map to produce output blocks, Fig. 6, pg. 285, left col, last para.; We must design a 13 × 13-bit multiplier for convolution on low bit-width data, pg. 289, right col, first para)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the device of Modified Dally to incorporate the teachings of Chang for the benefit of increasing feature maps and optimizing CNN dataflow so that super-resolution (SR) algorithm can be driven at low power in display applications (Chang, abstract)

	Regarding claim 16, Modified Dally teaches the method of claim 15, Modified Dally does not explicitly teach wherein producing of the integrated boundary information includes changing the target boundaries on which the bitwise OR operation is to be performed according to a stride value
	 Chang teaches wherein producing of the integrated boundary information includes changing the target boundaries on which the bitwise OR operation is to be performed according to a stride value (we obtain the relative position (xr, yr) using (KD − S × xi, KD − S × yi). This is because input pixels are shifted by the stride S in the output feature map to produce output blocks, Fig. 6, pg. 285, left col, last para.; We must design a 13 × 13-bit multiplier for convolution on low bit-width data, pg. 289, right col, first para.)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the device of Modified Dally to incorporate the teachings of Chang for the benefit of increasing feature maps and optimizing CNN dataflow so that super-resolution (SR) algorithm can be driven at low power in display applications (Chang, abstract)

24.	Claims 9 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Dally et al. (US20180046900) in view of Ko et al. (US11250326 filed 12/06/2018) in view of Chen et al (US20190115933) and further in view of Jaganathan et al. (US20190197401 filed 10/15/2018)

	Regarding claim 9, Modified Dally teaches the neural network accelerating apparatus of claim 6, Modified Dally does not explicitly teach wherein each target boundary corresponds to a respective position of a sliding window by which the weight as converted to the 1D vector is applied to the input feature as converted to the 1D vector
	Jaganathan teaches wherein each target boundary corresponds to a respective position of a sliding window by which the weight as converted to the 1D vector is applied to the input feature as converted to the 1D vector (The convolution operation includes sliding the kernel over the input image. For each position of the kernel, the overlapping values of the kernel and the input image are multiplied and the results are added [0108]; A convolution works by sliding these windows of size 3×3 or 5×5 over the 3D input feature map, stopping at every location, and extracting the 3D patch of surrounding features (shape (window height, window_width, input_depth)). Each such 3D patch is ten transformed (via a tensor product with the same learned weight matrix, called the convolution kernel) into a 1D vector of shape (output_depth) [0088])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the device of Modified Dally to incorporate the teachings of Jaganathan for the benefit of making convolutional neural networks data efficient because they need fewer training samples to learn representations (Jaganathan, [0082]).

	Regarding claim 17, The method of claim 15, wherein each target boundary corresponds to a respective position of a sliding window by which the weight as converted to the 1D vector is applied to the input feature as converted to the 1D vector. (The convolution operation includes sliding the kernel over the input image. For each position of the kernel, the overlapping values of the kernel and the input image are multiplied and the results are added [0108]; A convolution works by sliding these windows of size 3×3 or 5×5 over the 3D input feature map, stopping at every location, and extracting the 3D patch of surrounding features (shape (window height, window_width, input_depth)). Each such 3D patch is ten transformed (via a tensor product with the same learned weight matrix, called the convolution kernel) into a 1D vector of shape (output_depth) [0088])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the device of Modified Dally to incorporate the teachings of Jaganathan for the benefit of making convolutional neural networks data efficient because they need fewer training samples to learn representations (Jaganathan, [0082]).

Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MORIAM MOSUNMOLA GODO whose telephone number is (571)272-8670. The examiner can normally be reached Monday-Friday 7:30am-5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/M.G./Examiner, Art Unit 2121                




/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121