DETAILED ACTION
Introduction
This office action is in response to Applicant’s submission filed on 09/22/2022. Claims 1, 4-8 are pending in the application and have been examined.
	
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
The response filed on 09/22/2022 has been correspondingly accepted and considered in this Office Action. Claims 1, 4-8 have been examined. Claims 2-3 have been cancelled. 
Applicant’s amendments to claim 1, indicating “the speech feature reuse-based storing and calculating compression method resulting in reducing a power of the neural network circuit, saving energy and keeping a neural network to maintain an ultra-low-power running state while completing a keyword-spotting function in a normally- open state” with the support in the Specifications [0005] does not overcome the 35 U.S.C 101 rejections previously set forth in the Non-Final Office Action mailed 06/22/2022. Consequently, "preamble language merely extolling benefits or features of the claimed invention does not limit the claim scope without clear reliance on those benefits or features as patentably significant." per MPEP 2111.02 and MPEP 2111.04. Further it should be noted that the Examiner is interpreting the newly amended limitation as identified as similar to a “wherein/whereby/etc.) clause where "‘whereby clause in a method claim is not given weight when it simply expresses the intended result of a process step positively recited.” The amendment to claim 1 only indicates the intended use of the method but does not further limit or indicate the computation of the keyword using the compression method stated in claim 1. Therefore, the above referenced rejections under 35 U.S.C. 101 are sustained and further updated accordingly.

Response to Arguments
Applicant's arguments filed 09/22/2022  have been fully considered as follows:
Applicant’s arguments with respect to claim 1 state that
“Zheng discloses three different banks (bank0, bank1, bank2) to cache different rows (for 
example, bankO stores 3, 6, 9, banki inches 0, 2, 4) and steps to 3, so as to ensure that the 
output of the three banks each time is 3.. There is no teaching or suggestion of how to adjust the address order in Zheng”

The examiner respectfully disagrees, Zheng teaches “The immediate data in 4 CONV layers generated from the starting input speech feature map (Fmap 1) are buffered as follows. First 3 layers buffer the last two rows of their output feature map, and the final layer buffers the whole output feature map except the oldest row. For the consequent speech inputs (Fmap 2),
only the newest 3 frames (frame 10–12 in Fig.6(c)) of the input feature maps are used to compute the non-overlapping output row. And this row is combined with 2 buffered rows of the first layer to generate the non-overlapping row for next layer. The buffered features are updated by adding the newly generated row and discarding the oldest row of each channel” in Zheng, pg. 4653, sect. IV, the frame level reuse is performed by the BCNN taught by Zheng and the address sequence is updated using memory portioning technique which is indicated in “The mapping of data in the memory is decided by its row index modulo 3”, that is, because where the oldest row is replaced with the newly generated row and discarding the oldest row indicating address order to ensure the updated address is correct. Therefore,  Zheng teaches “part of rows of data of a previous frame of input data is replaced with updated row data of input data of a current frame, an addressing sequence of the updated input data is adjusted to perform an operation on the undated input data and a convolution kernel in an arrival sequence of the input data.”
Applicant’s further arguments with respect to claim 1 state that
“Instant invention discloses the case where the number of rows updated by the input
data is equal to or not equal to the convolution stride which is not disclosed by
Zheng.”

The examiner respectfully disagrees, Zheng teaches “In the convolution of speech feature maps, apart from spatial locality, temporal locality is also observed, as shown in Fig. 6 (a). Each frame
(40-dimension feature vector) is combined with 10 contiguous frames into an 11×40 sized feature map. Therefore, two consecutive feature maps have 10 frames in common (shaded).
Fig. 6 (b) illustrates that when they are convolved with 3×3 kernels of the first convolutional layer, the output feature maps will have 8 rows in common (out of 9 rows in total)” in Zheng, pg. 4653, hence for the conv 3x3 kernel for the step size equal to the kernel size.  Therefore, Zheng teaches “intermediate data are updated to correspond to a convolution result of the updated row data of the input data under the two conditions that the updated row number of the input data is equal to a convolution step size” 
Applicant’s further argument with respect to claim 1 state that
“Ding implements parallel computing operations by simultaneously executing 10 tiles in
the input matrix and the weight matrix in each cycle. PTO considers that this is a case where the input data is not equal to the number of updated rows.”

The examiner respectfully disagrees, Ding teaches “The depthwise convolution unit (DCU) is composed of the configurable line buffer and the MAC unit as shown in Fig. 7. The depthwise convolution is carried out by k × k convolution operations, which has the spatial proximity. To make fully use of the spatial proximity to reduce the access of input features, this paper employed a configurable line buffer. When input data goes through the buffer in row-major layout, the line buffer releases a sliding window selection on the input image and multiple rows of input pixels can be buffered for simultaneous access”  in Ding pg. 282, sect. 4.3. Ding teaches computing the intermediate results for different line buffer which is configurable for different sizes or shifting strides. Therefore, Ding teaches “intermediate data are updated to correspond to a convolution result of the updated row data of the input data under the conditions that the updated row number of the input data is not equal to the convolution step size”. 
Therefore, the rejections of Claims 1 are rejected under 35 U.S.C. 103 are sustained and further updated accordingly.
In response to the art rejection(s) of the remainder of dependent claims are rejected under 35 U.S.C 103, in case said claims are correspondingly discussed and/or argued for at least the same rationale presented in Remarks filed 09/22/2022, Examiner respectfully notes as follows. For completeness, should the mentioned claims be likewise traversed for similar reasons to independent claim 1, Examiner respectfully directs Applicant to the same previous supra reasons provided in the response directed towards claim 1 discussed above. For at least the same supra provided reasons, Examiner likewise respectfully disagrees, and Applicant's arguments have been fully considered but they are not persuasive.

Claim Objections
Claim 5 is objected to because of the following informalities:  it is depending on a canceled claim 3. Appropriate correction is required. However, for examining purposes, claim 5 is interpreted as depending on claim 1.


Claim Rejections - 35 USC § 101
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1, 4-8 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a mathematical calculation without significantly more.
Claim 1 recites intermediate data are updated to correspond to a convolution result of the updated row data of the input data.
The limitation of intermediate data are updated to correspond to a convolution result of the updated row data of the input data, as drafted, is a process that, under its broadest reasonable interpretation, covers a mathematical calculation/optimization/operations of "row data" "input data" "intermediate data". For example, “updated” in the context of this claim encompasses the calculation of the convolution result of intermediate data based on the input data. Similarly, the limitation of an addressing sequence of the updated input data is adjusted to perform an operation on the updated input data, as drafted, is a process that, under its broadest reasonable interpretation, covers mathematical calculation/optimization/operations of "addressing" "input data". For example, “adjusted to perform” in the context of this claim encompasses the computation of the address of the storage location of the input data. If a claim limitation, under its broadest reasonable interpretation, covers mathematical computation, then it falls within the “mathematical formula or calculation” grouping of abstract ideas. Accordingly, the claim recites a mathematical concept.
This judicial exception is not integrated into a practical application. In particular, the claim only recites one additional element for “a keyword-spotting CNN(neural network)”, this additional element does not integrate the mathematical concept into a practical application because it does not impose any meaningful limits on practicing the mathematical computation. The amended claim “the speech feature reuse-based storing and calculating compression method resulting in reducing a power of the neural network circuit, saving energy and keeping a neural network to maintain an ultra-low-power running state while completing a keyword-spotting function in a normally- open state” only discusses an intended use for the method but does not further limit or indicate the computation of the keyword using the compression method stated in claim 1. The claim is directed to a mathematical concept. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Similarly, claims 4-7 are directed updated row number of the input data is equal or not equal to a convolution step size, under its broadest reasonable interpretation, covers mathematical computation, then it falls within the “mathematical formula or calculation” grouping of abstract ideas. Accordingly, the claims recite a mathematical concept. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claims are not patent eligible.
Claim 8 is based on a method to compute for keyword spotting convolution neural network pooling layer based on claim 1. Based on under its broadest reasonable interpretation, covers mathematical computation, then it falls within the “mathematical formula or calculation” grouping of abstract ideas. Accordingly, the claim recites a mathematical concept. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
	

Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1 and 4-8 are rejected under 35 U.S.C. 103 as being unpatentable over S. Zheng et al., "An Ultra-Low Power Binarized Convolutional Neural Network-Based Speech Recognition Processor With On-Chip Self-Learning," in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 66, no. 12, pp. 4648-4661, Dec. 2019 in view of Ding, W., Huang, Z., Huang, Z., Tian, L., Wang, H., & Feng, S. (2019). Designing efficient accelerator of depthwise separable convolutional neural network on FPGA. Journal of Systems Architecture, 97, 278-286.
Regarding claim 1, Zheng teaches a speech feature reuse-based storing and calculating compression method for a keyword-spotting CNN, when each frame of input data arrives (see Zheng, pg. 4653, sect IV A, Fig. 6a, each frame (40-dimension feature vector) is combined with 10 contiguous frames into an 11×40 sized feature map. Therefore, two consecutive feature maps have 10 frames in common (shaded)), wherein a part of rows of data of a previous frame of input data is replaced with updated row data of input data of a current frame (see Zheng, pg. 4653, sect IV A, We address this problem by exploiting frame-level data reuse. The immediate data in 4 CONV layers generated from the starting input speech feature map (Fmap 1) are buffered as follows. First 3 layers buffer the last two rows of their output feature map, and the final layer buffers the whole output feature map except the oldest row),  an addressing sequence of the updated input data is adjusted to perform an operation on the updated input data and a convolution kernel in an arrival sequence of the input data (see Zheng, pg. 4653, sect IV A, In cooperation with the above frame-level activation reuse, we propose a memory partitioning technique to ensure conflict-free data loading, as illustrated in Fig. 6(d). In the proposed computing data-flow, features from 3 rows of input feature maps (across 32/64 channels) are needed for XNOR computation in one cycle. Therefore, parallel memory access of multiple rows is demanded in BCNN computation. We partition the feature buffer into 3 banks, ensuring the potential to provide multiple data stream. The mapping of data in the memory is decided by its row index modulo 3, e.g. the 4th row is stored in bank 4%3 = 1. As shown in the Fig. 6(b), in this way, data from arbitrary 3 consecutive rows can be read out without conflict, e.g., rows 3,4,5 are loaded from bank 0,1,2, respectively), and intermediate data are updated to correspond to a convolution result of the updated row data of the input data under the two conditions that the updated row number of the input data is equal to a convolution step size (see Zheng, pg. 4653, sect IV A, The immediate data in 4 CONV layers generated from the starting input speech feature map (Fmap 1) are buffered as follows. First 3 layers buffer the last two rows of their output feature map, and the final layer buffers the whole output feature map except the oldest row. For the consequent speech inputs (Fmap 2), only the newest 3 frames (frame 10–12 in Fig. 6(c)) of the input feature maps are used to compute the non-overlapping output row. And this row is combined with 2 buffered rows of the first layer to generate the non-overlapping row for next layer. The buffered features are updated by adding the newly generated row and discarding the oldest row of each channel); the speech feature reuse-based storing and calculating compression method resulting in reducing a power of the neural network circuit, saving energy and keeping a neural network to maintain an ultra-low-power running state while completing a keyword-spotting function in a normally- open state(This is an intended use of the method, Further it should be noted that the Examiner is interpreting the newly amended limitation as identified as similar to a “wherein/whereby/etc.) clause where "‘whereby clause in a method claim is not given weight when it simply expresses the intended result of a process step positively recited.”. However, also see Zheng, pg. 4652, Fig. 4. Architecture of proposed speech recognition processor. The processor works in 3 function modes: wakeup word detection, voice command recognition, and continuous speech (outputs phoneme scores));  wherein adjusting addressing sequence of the updated input data comprises circularly shifting the data addressing sequence down by m bits, m being the updated row number of the input data (see Zheng, pg. 4653, sect IV A, We partition the feature buffer into 3 banks, ensuring the potential to provide multiple data stream. The mapping of data in the memory is decided by its row index modulo 3, e.g. the 4th row is stored in bank 4%3 = 1. As shown in the Fig. 6(b), in this way, data from arbitrary 3 consecutive rows can be read out without conflict, e.g., rows 3,4,5 are loaded from bank 0,1,2, respectively); wherein updating, when the updated row number of the input data is equal to the convolution step size, the intermediate data to correspond to the convolution result of the updated row data of the input data specifically comprises directly updating the intermediate data to be the convolution result obtained after the addressing sequence of the input data is adjusted (see Zheng, pg. 4653, sect IV B, in the convolution of speech feature maps, apart from spatial locality, temporal locality is also observed, as shown in Fig. 6 (a). Each frame (40-dimension feature vector) is combined with 10 contiguous frames into an 11×40 sized feature map. Therefore, two consecutive feature maps have 10 frames in common (shaded). In cooperation with the above frame-level activation reuse, we propose a memory partitioning technique to ensure conflict-free data loading, as illustrated in Fig. 6(d). In the proposed computing data-flow, features from 3 rows of input feature maps (across 32/64 channels) are needed for XNOR computation in one cycle. We partition the feature buffer into 3 banks, ensuring the potential to provide multiple data stream. The mapping of data in the memory is decided by its row index modulo 3, e.g. the 4th row is stored in bank 4%3 = 1. As shown in the Fig.6(b), in this way, data from arbitrary 3 consecutive rows can be read out without conflict, e.g., rows 3,4,5 are loaded from bank 0,1,2, respectively). However, Zheng fails to teach, the updated row number of the input data is not equal to the convolution step size.
	However, Ding teaches the updated row number of the input data is not equal to the convolution step size (see Ding, pg. 282, sect. 4.3, The depthwise convolution unit (DCU) is composed of the configurable line buffer and the MAC unit as shown in Fig. 7. The depthwise convolution is carried out by k × k convolution operations, which has the spatial proximity. To make fully use of the spatial proximity to reduce the access of input features, this paper employed a configurable line buffer. When input data goes through the buffer in row-major layout, the line buffer releases a sliding window selection on the input image and multiple rows of input pixels can be buffered for simultaneous access and see Ding, pg. 283, sect. 5.2 (1), in CNN, because of the difference of computation and the number of channels of the feature maps for every layer, the parallelization parameter pn and pm for each layer is configured differently. Besides, we also adopt input data reuse in the design as shown in Fig. 12. Multiple filters are applied to the same feature map, so the input feature map activations are used multiple times across filters; as shown in Fig. 12, the reuse data along with multiple filters as shown in Fig. 12 is interpreted as reserving all convolution calculation intermediate results of the input data between adjacent repeated input feature values ; is interpreted as updating the row number is not equal to the convolution step size).
Zheng and Ding  are considered to be analogous to the claimed invention because they relate to depthwise separable CNNs is orthogonal to model compression. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Zheng on quantizing network parameters with low bit-width on algorithmic level with the depthwise separable convolutional neural network accelerator with all the layers working concurrently in pipelined fashion teachings of Ding to improve the system throughput and performance (see Ding, pg. 279, sect. 1).
	Regarding claim 4, Zheng in view of Ding teach the speech feature reuse-based storing and calculating compression method for the keyword-spotting CNN according to claim 1. Ding further teaches when the updated row number of the input data is not equal to the convolution step size, the intermediate data to correspond to the convolution result of the updated row data of the input data specifically comprises reserving all convolution calculation intermediate results of the input data between adjacent repeated input feature values (see Ding, pg. 283, sect. 5.2 (1), in CNN, because of the difference of computation and the number of channels of the feature maps for every layer, the parallelization parameter pn and pm for each layer is configured differently. Besides, we also adopt input data reuse in the design as shown in Fig. 12. Multiple filters are applied to the same feature map, so the input feature
map activations are used multiple times across filters; as shown in Fig. 12, the reuse data along with multiple filters as shown in Fig. 12 is interpreted as reserving all convolution calculation intermediate results of the input data between adjacent repeated input feature values ).
Regarding claim 5, Zheng in view of Ding teach the speech feature reuse-based storing and calculating compression method for the keyword-spotting CNN according to claim 3, Zheng further teaches wherein when the updated row number of the input data is equal to the convolution step size, the stored row number of the input data is compressed into a size of a first dimension of a convolution kernel of this layer, and a convolution operation result of each step is compressed into the size of the first dimension of the convolution kernel of the layer (see Zheng, pg. 4653, sect IV B, Fig. 6 (b) illustrates that when they are convolved with
3×3 kernels of the first convolutional layer, the output feature maps will have 8 rows in common (out of 9 rows in total). Similar phenomenon can be observed on all the subsequent layers & Zheng, pg. 4650 Table 1 Statistics of Proposed BCNN, shows the compression to the first dimension of the convolution kernel ).
Regarding claim 6, Zheng in view of Ding teach the speech feature reuse-based storing and calculating compression method for the keyword-spotting CNN according to claim 4. Zheng further teaches wherein when the updated row number of the input data is not equal to the convolution step size, data storage of an input layer is compressed into a size of a first dimension of a convolution kernel of this layer (see Zheng, pg. 4653, sect IV B, For the sake of reducing energy consumption of weight accessing, we consider the compression of BCNN weights), the intermediate data of each convolution layer is stored as K times of the size of the first dimension of the convolution kernel of this layer, K being a ratio of the convolution step size to the updated row number of the input data (see Zheng, pg. 4624, sect. IV B, A 2b flag table is designed to record the bank types and direct the accessing of these hybrid banks. Each hybrid bank owns a separate address generator. In each cycle, the 2-4 decoder indicates which bank to read, and the address generator provides the exact address; hybrid banks address decoder interpreted as the K ratio).
Regarding claim 7, Zheng in view of Ding teach the speech feature reuse-based storing and calculating compression method for the keyword-spotting CNN according to claim 6. Zheng further teaches wherein the convolution operation result of each step is stored into first to K-th intermediate data memories in sequence (see Zheng, pg. 4653, sect IV A, The immediate data in 4 CONV layers generated from the starting input speech feature map (Fmap 1) are buffered as follows. First 3 layers buffer the last two rows of their output feature map, and the final layer buffers the whole output feature map except the oldest row. For the consequent speech inputs (Fmap 2), only the newest 3 frames (frame 10–12 in Fig.6(c)) of the input feature maps are used to compute the non-overlapping output row. And this row is combined with 2 buffered rows of the first layer to generate the non-overlapping row for next layer. The buffered features are updated by adding the newly generated row and discarding the oldest row of each channel. Parallel memory access of multiple rows is demanded in BCNN computation. We partition the feature buffer into 3 banks, ensuring the potential to provide multiple data stream; Fig. 6 operation is interpreted as each step intermediate data in sequence).
 Regarding claim 8, Zheng in view of Ding teach the method according to claim 1 as indicated earlier.  Zheng further teaches speech feature reuse-based storing and calculating compression method for a keyword-spotting convolutional neural network pooling layer, wherein it is achieved by using the method according to claim 1  (see Zheng, pg. 4650, Fig. 2,  Zheng, pg. 4652, Fig. 4).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Huimin Li, Xitian Fan, Li Jiao, Wei Cao, Xuegong Zhou and Lingli Wang, "A high performance FPGA-based accelerator for large-scale convolutional neural networks," 2016 26th International Conference on Field Programmable Logic and Applications (FPL), 2016, pp. 1-9) teaches given the kernel size Size kernel and the corresponding shifting stride stride, the numbers of multipliers, accumulators and multiplexers required for 1-D and 2-D PE to implement a CONV layer (see Huimin, pg. 4 section IV B).
Chen et. al, (US Patent Application Publication, 2019/0197083) teaches a lightweight neural network, a MobileNet which uses the idea of depthwise separable convolutions, and instead of fusing channels when calculating convolutions (e.g., 3*3 convolution kernel or larger size), it uses depthwise (or known as channel-wise) and 1*1 pointwise convolution method to decompose convolution, such that the speed and model size are optimized, and the calculation accuracy is basically kept (see Chen, [0033]).
8.     THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NANDINI SUBRAMANI whose telephone number is (571)272-3916. The examiner can normally be reached Monday - Friday 12:00pm - 5:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh M Mehta can be reached on (571)272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NANDINI SUBRAMANI/            Examiner, Art Unit 2656                                                                                                                                                                                            
/BHAVESH M MEHTA/            Supervisory Patent Examiner, Art Unit 2656