DETAILED ACTION
Claims 1-25 are pending.
The office acknowledges the following papers:
Oath filed on 11/13/2018.

	Priority
No claim for priority has been made in this application.

Drawings
The drawings are objected to under 37 CFR 1.83(a).  The drawings must show every feature of the invention specified in the claims.  Therefore, the limitations from claims 2 and 6-7 must be shown or the feature(s) canceled from the claim(s). This can be done by providing element labels for the RCLT, WCLT, and corresponding Client Buffers. No new matter should be entered.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering 

Specification
The disclosure is objected to because of the following informalities:
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. The Applicant’s cooperation is requested in correcting any errors of which the Applicant may become aware.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Power et al. (U.S. 2020/0089506), in view of Diamant et al. (U.S. 10,592,250), and in view of Ikeda et al. (U.S. 2019/0287235).
As per claim 1:
Power disclosed a neural processing unit (NPU), comprising: 
an NPU direct memory access (NDMA) core (Power: Figure 5 elements 504-506 and 514, paragraph 53) comprising:
a read engine having a read buffer (Power: Figure 5 elements 504-506, paragraphs 59-60)(The data store holds input image data and reads upon the read buffer.); and 
a write engine having a write buffer (Power: Figure 5 elements 504 and 514, paragraph 64)(The output buffer holds processed results and reads upon the write buffer.); and 
a controller configured to direct the NDMA core to perform on blocks of a data stripe to process tensors in artificial neural networks (Power: Figure 5 elements 500-502, paragraphs 25-26 and 54)(The controller selects execution modes of the CNN accelerator. The CNN performs convolutions on input image data represented by 3D tensors.).
Power failed to teach a controller configured to direct the NDMA core to perform hardware pre-processing of NDMA data in the read buffer and post-processing of NDMA data in the write buffer.
However, Diamant combined with Power disclosed a controller configured to direct the NDMA core to perform hardware post-processing of NDMA data in the write buffer 
The advantage of having separate computing resources and post-processing resources is that processing can be performed in a pipelined manner, which results in increased performance. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the post-processor of Diamant into the CNN accelerator of Power for the above advantage.
Power and Diamant failed to teach a controller configured to direct the NDMA core to perform hardware pre-processing of NDMA data in the read buffer.
However, Ikeda combined with Power and Diamant disclosed a controller configured to direct the NDMA core to perform hardware pre-processing of NDMA data in the read buffer (Ikeda: Figure 3 elements 152-154, paragraphs 65 and 67-68)(Power: Figure 5 element 506, paragraphs 58-60 and 62)(Ikeda disclosed performing pre-processing operations on input image data prior to execution on the CNN engine. The combination allows for Power to perform pre-processing on input image data in the data store prior to selection and inputting the data into the DPE array.).
The advantage of pre-processing image data is that image data can be filtered to enhance feature detection (Ikeda: Paragraph 67). Thus, it would have been obvious to 
As per claim 9:
Power, Diamant, and Ikeda disclosed the NPU of claim 1, in which the controller is configured to direct the NDMA core to crop the NDMA data in the read buffer and/or the write buffer (Ikeda: Figure 3 elements 152-154, paragraphs 65 and 67-68)(Power: Figure 5 element 506, paragraphs 58-60 and 62)(Ikeda disclosed performing pre-processing operations (e.g. enlargement/contraction) on input image data prior to execution on the CNN engine. The combination allows for Power to perform pre-processing on input image data in the data store prior to selection and inputting the data into the DPE array.).

Claims 3-5, 8, and 10-25 are rejected under 35 U.S.C. 103 as being unpatentable over Power et al. (U.S. 2020/0089506), in view of Diamant et al. (U.S. 10,592,250), and in view of Ikeda et al. (U.S. 2019/0287235), in view of Official Notice.
As per claim 3:
Power, Diamant, and Ikeda disclosed the NPU of claim 1, further comprising: 
a read arbiter coupled to the read engine (Power: Figure 5 elements 504 and 516, paragraph 59)(Official notice is given that load queues/buffers can be used for the advantage of storing multiple load memory requests and selecting priority load requests to output to memory first. Thus, it would have been obvious to one of ordinary skill in the art to implement the load queue/buffer within Power.); 
a write arbiter coupled to the write engine (Power: Figure 5 elements 504 and 516, 
an external memory coupled to the read arbiter and the write arbiter (Power: Figure 5 element 516, paragraph 59).
As per claim 4:
Power, Diamant, and Ikeda disclosed the NPU of claim 3, further comprising a bus bridge coupled between the external memory and the read arbiter and the write arbiter (Power: Figure 5 elements 504 and 516, paragraph 59)(Official notice is given that bus bridges can be implemented between processors and memory for the advantage of connecting multiple different devices to the processor. Thus, it would have been obvious to one of ordinary skill in the art to implement a bus bridge in Power.).
As per claim 5:
Power, Diamant, and Ikeda disclosed the NPU of claim 4, further comprising a network on chip (NoC) coupled between the external memory and the bus bridge (Power: Figure 5 elements 504 and 516, paragraph 59)(Official notice is given that NoCs can be implemented in SoCs for the advantages of reduced wire routing congestion and higher operating frequencies. Thus, it would have been obvious to one of ordinary skill in the art to implement a NoC in Power.).
As per claim 8:
Power, Diamant, and Ikeda disclosed the NPU of claim 1, in which the controller is configured to direct the NDMA core to pad the NDMA data in the read buffer and/or the 
As per claim 10:
Power, Diamant, and Ikeda disclosed the NPU of claim 9, in which the controller is configured to direct the NDMA core to sign extend the NDMA data in the read buffer and/or the write buffer (Ikeda: Figure 3 elements 152-154, paragraphs 65 and 67-68)(Power: Figure 5 element 506, paragraphs 58-60 and 62)(Ikeda disclosed performing pre-processing operations (e.g. enlargement/contraction) on input image data prior to execution on the CNN engine. Official notice is given that increasing data value sizes can be done on signed data, which involves sign extending data values, for the advantage of operating on larger data values. Thus, it would have been obvious to one of ordinary skill in the art to implement sign-extending input data values in Powers.).
As per claim 11:
Power disclosed a method for hardware pre-processing and post-processing of direct memory access (DMA) data in artificial neural networks, comprising: 
programming configuration registers of a neural processing unit (NPU) direct memory access (NDMA) core for a read client and/or a write client (Power: Figure 5 elements 504-506 and 514, paragraphs 59-60 and 64)(The data store holds input image data and reads upon the read client. The output buffer holds processed results and reads upon the write client. Official Notice is given that DMAs include programmable 
streaming data blocks of a data stripe to/from an external memory of the NDMA core (Power: Figures 5, 11-12 elements 504 and 516, paragraphs 59, 115, 118)(Rows/columns of image data are load/stored from/to main memory by the DMA controller.); and 
Power failed to teach pre-processing and post-processing the data blocks in a buffer of the NDMA core during streaming of the data blocks.
However, Diamant combined with Power disclosed post-processing the data blocks in a buffer of the NDMA core during streaming of the data blocks (Diamant: Figure 1 elements 126-128, column 5 lines 49-67)(Power: Figures 5 and 15 elements 502, 514, and 1514-1518, paragraphs 63-64, 77, 83, 87 and 160-162)(Diamant disclosed a post-processor that performs post-processing on outputs of the computing engine. Power disclosed a DPE array that performs pooling operations and post-processing operations on convolution results. The combination allows for a post-processor in Power to perform pooling and post-processing operations on convolution results stored in the output buffer.).
The advantage of having separate computing resources and post-processing resources is that processing can be performed in a pipelined manner, which results in increased performance. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the post-processor of Diamant into 
Power and Diamant failed to teach pre-processing the data blocks in a buffer of the NDMA core during streaming of the data blocks.
However, Ikeda combined with Power and Diamant disclosed pre-processing the data blocks in a buffer of the NDMA core during streaming of the data blocks (Ikeda: Figure 3 elements 152-154, paragraphs 65 and 67-68)(Power: Figure 5 element 506, paragraphs 58-60 and 62)(Ikeda disclosed performing pre-processing operations on input image data prior to execution on the CNN engine. The combination allows for Power to perform pre-processing on input image data in the data store prior to selection and inputting the data into the DPE array.).
The advantage of pre-processing image data is that image data can be filtered to enhance feature detection (Ikeda: Paragraph 67). Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the pre-processing functions of Ikeda into the CNN accelerator of Power for the above advantage.
As per claim 12:
Power, Diamant, and Ikeda disclosed the method of claim 11, in which pre-processing and post-processing comprises padding NDMA data in a read buffer and/or a write buffer during streaming of the data blocks (Diamant: Figure 1 elements 126-128, column 5 lines 49-67)(Power: Figure 15 element 1516, paragraph 77, 83, and 162)(The combination allows for a post-processor in Power to perform pooling and post-processing operations on convolution results stored in the output buffer. Official notice is given that MAX/AVG pooling operations use padded values along first/last rows/columns of results 
As per claim 13:
Power, Diamant, and Ikeda disclosed the method of claim 11 in which pre-processing and post-processing comprises cropping NDMA data in a read buffer and/or a write buffer during the streaming of the data blocks (Ikeda: Figure 3 elements 152-154, paragraphs 65 and 67-68)(Power: Figure 5 element 506, paragraphs 58-60 and 62)(Ikeda disclosed performing pre-processing operations (e.g. enlargement/contraction) on input image data prior to execution on the CNN engine. The combination allows for Power to perform pre-processing on input image data in the data store prior to selection and inputting the data into the DPE array.).
As per claim 14:
Power, Diamant, and Ikeda disclosed the method of claim 11, in which pre-processing and post-processing comprises sign extending NDMA data in a read buffer and/or a write buffer during streaming of the data blocks (Ikeda: Figure 3 elements 152-154, paragraphs 65 and 67-68)(Power: Figure 5 element 506, paragraphs 58-60 and 62)(Ikeda disclosed performing pre-processing operations (e.g. enlargement/contraction) on input image data prior to execution on the CNN engine. Official notice is given that increasing data value sizes can be done on signed data, which involves sign extending data values, for the advantage of operating on larger data values. Thus, it would have been obvious to one of ordinary skill in the art to implement sign-extending input data values in Powers.).
As per claim 15:
Power, Diamant, and Ikeda disclosed the method of claim 11, further comprising: 
unpacking NDMA data during the streaming of data blocks of the data stripe to/from the external memory (Ikeda: Figure 3 elements 152-154, paragraphs 65 and 67-68)(Power: Figure 5 element 506, paragraphs 58-60 and 62)(Ikeda disclosed performing pre-processing operations (e.g. enlargement/contraction) on input image data prior to execution on the CNN engine. Official notice is given that image processing systems can implement unpacker modules for the advantage of organizing pixel values prior to processing. Thus, it would have been obvious to one of ordinary skill in the art to implement unpacking pixel values during pre-processing operations in Power.); and 
repacking NDMA data prior to the streaming of data blocks of the data stripe to the external memory (Diamant: Figure 1 elements 126-128, column 5 lines 49-67)(Power: Figure 15 element 1516, paragraph 77, 83, and 162)(The combination allows for a post-processor in Power to perform pooling and post-processing operations on convolution results stored in the output buffer. Official notice is given that image processing systems can implement packer modules for the advantage of organizing pixel values prior to storage. Thus, it would have been obvious to one of ordinary skill in the art to implement packing pixel values during post-processing operations in Power.).
As per claim 16:
Claim 16 essentially recites the same limitations of claim 11. Therefore, claim 16 is rejected for the same reasons as claim 11.
As per claim 17:
The additional limitation(s) of claim 17 basically recite the additional limitation(s) of 
As per claim 18:
The additional limitation(s) of claim 18 basically recite the additional limitation(s) of claim 13. Therefore, claim 18 is rejected for the same reason(s) as claim 13.
As per claim 19:
The additional limitation(s) of claim 19 basically recite the additional limitation(s) of claim 14. Therefore, claim 19 is rejected for the same reason(s) as claim 14.
As per claim 20:
The additional limitation(s) of claim 20 basically recite the additional limitation(s) of claim 15. Therefore, claim 20 is rejected for the same reason(s) as claim 15.
As per claim 21:
Claim 21 essentially recites the same limitations of claim 11. Therefore, claim 21 is rejected for the same reasons as claim 11.
As per claim 22:
The additional limitation(s) of claim 22 basically recite the additional limitation(s) of claim 12. Therefore, claim 22 is rejected for the same reason(s) as claim 12.
As per claim 23:
The additional limitation(s) of claim 23 basically recite the additional limitation(s) of claim 13. Therefore, claim 23 is rejected for the same reason(s) as claim 13.
As per claim 24:
The additional limitation(s) of claim 24 basically recite the additional limitation(s) of claim 14. Therefore, claim 24 is rejected for the same reason(s) as claim 14.
As per claim 25:
.

Claims 2 and 6-7 are rejected under 35 U.S.C. 103 as being unpatentable over Power et al. (U.S. 2020/0089506), in view of Diamant et al. (U.S. 10,592,250), and in view of Ikeda et al. (U.S. 2019/0287235), further in view of Norden et al. (U.S. 2019/0340491).
As per claim 2:
Power, Diamant, and Ikeda disclosed the NPU of claim 1.
Power, Diamant, and Ikeda failed to teach a read client coupled to an interface of the read engine of the NDMA core and a write client coupled to an interface of the write engine of the NDMA core.
However, Norden combined with Power, Diamant, and Ikeda disclosed a read client coupled to an interface of the read engine of the NDMA core and a write client coupled to an interface of the write engine of the NDMA core (Norden: Figures 3-4 elements 314, 402, and 424, paragraphs 52-53, 61, and 67)(Power: Figure 5 element 520, paragraph 63)(Norden disclosed an input buffer circuit and an output within individual neural engines. The combination allows for implementing an input buffer circuit (i.e. read client) and an output (i.e. write client) within each DPE of Power.).
The advantage of input and output buffer storage elements within computing elements is that data can be stored closer to the array for faster access times and data reuse for complex operations. Thus, it would have been obvious to one of ordinary skill in the art to implement the input and output buffers of Norden into the DPEs of Power for the above advantage.
As per claim 6:
Power, Diamant, and Ikeda disclosed the NPU of claim 1. 
Power, Diamant, and Ikeda failed to teach a write client coupled to a first memory interface of the NDMA core; and a read client coupled to a second memory interface of the NDMA core.
However, Norden combined with Power, Diamant, and Ikeda disclosed a write client coupled to a first memory interface of the NDMA core (Norden: Figures 3-4 elements 314 and 424, paragraphs 52-53 and 67)(Power: Figure 5 element 520, paragraph 63)(Norden disclosed an output within individual neural engines. The combination allows for implementing an output (i.e. write client) within each DPE of Power.); and 
a read client coupled to a second memory interface of the NDMA core (Norden: Figures 3-4 elements 314 and 402, paragraphs 52-53 and 61)(Power: Figure 5 element 520, paragraph 63)(Norden disclosed an input buffer circuit within individual neural engines. The combination allows for implementing an input buffer circuit (i.e. read client) within each DPE of Power.).
The advantage of input and output buffer storage elements within computing elements is that data can be stored closer to the array for faster access times and data reuse for complex operations. Thus, it would have been obvious to one of ordinary skill in the art to implement the input and output buffers of Norden into the DPEs of Power for the above advantage.
As per claim 7:
Power, Diamant, and Ikeda disclosed the NPU of claim 6.

However, Norden combined with Power, Diamant, and Ikeda disclosed in which the write client and the read client comprise a client buffer used to store the NDMA data of the NDMA core (Norden: Figures 3-4 elements 314, 402, and 424, paragraphs 52-53, 61, and 67)(Power: Figure 5 element 520, paragraph 63)(Norden disclosed an input buffer circuit and an output within individual neural engines. The combination allows for implementing an input buffer circuit (i.e. read client) and an output (i.e. write client) within each DPE of Power.).
The advantage of input and output buffer storage elements within computing elements is that data can be stored closer to the array for faster access times and data reuse for complex operations. Thus, it would have been obvious to one of ordinary skill in the art to implement the input and output buffers of Norden into the DPEs of Power for the above advantage.

	Conclusion
The following is text cited from 37 CFR 1.111(c): In amending in reply to a rejection of claims in an application or patent under reexamination, the applicant or patent owner must clearly point out the patentable novelty which he or she thinks the claims present in view of the state of the art disclosed by the references cited or the objections made. The applicant or patent owner must also show how the amendments avoid such references or objections.

Thomas et al. (U.S. 2017/0124166), taught software modules performing pre-processing and post-processing operations for a GPU.
Yu et al. (U.S. 2018/0046913), taught a neural network processing unit with input/output buffers and DMA.
Henry et al. (U.S. 2018/0276035), taught neural network control/status registers.
Johnson et al. (U.S. 2018/0300165), taught a DMA engine with control and status registers.
Vantrease et al. (U.S. 2019/0294959), taught a neural network processor with a post-processor module.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACOB A. PETRANEK whose telephone number is (571)272-5988.  The examiner can normally be reached on M-F 8:00-4:30.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on (571) 272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JACOB PETRANEK/Primary Examiner, Art Unit 2183