DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 10/09/2019 and 04/26/2021 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Specification
The disclosure is objected to because of the following informalities:
Para [0023] states: Persistent storage 228 may also store one or more machine learning models, such as regression models, random forest models, support vector machines (SVMs) such as kernel SVMs, and artificial neural networks (ANNs) such as convolutional network networks (CNNs), recurrent network networks (RNNs)(emphasis added). Should be convolutional neural networks and recurrent neural networks. Appropriate correction is required.
Para [0074] states: In the pooling mode, for example, data processed at by first filter (emphasis added). Appropriate correction is required.


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-2, 7, 11-13, 17 and 19-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Martin et al. EP 3480745 A1.
Regrading claim 1, Martin teaches a neural processor, comprising: a plurality of neural engine circuits, at least one of the neural engine circuits configured to perform a convolution operation of first input data with one or more kernels to generate a first output(Martin, paras. 0023-0027, see also fig. 3,“ The hardware implementation 300 comprises a plurality of convolution engines 302[a plurality of neural engine circuits], a plurality of accumulators 304, an accumulation buffer 306, a coefficient buffer 308, and an input buffer 310. Each convolution engine 302 [at least one of the neural engine circuits] comprises hardware logic configured[configured to perform a convolution operation] to receive a set of weights {w1 ... w8}[with one or more kernels] that represent all or a portion of a filter, and a set of input data values {d1 .. d8}[ of first input data] that represent all or a portion of a window, and perform a multiply-accumulate calculation on the received weights and input data values… [e]ach accumulator 304 receives the output of one convolution engine 302[to generate a first output] and adds the output to the previous convolution engine output that relates to the same filter. Since the convolution engine may not generate or produce outputs that relate to the same filter in consecutive cycles the partial results of one or more filters may be stored in an accumulation buffer 306 and then the appropriate partial result may be provided to the accumulator each cycle by the accumulation buffer 306. In some examples, the accumulation buffer 306 may be able to store partial results related to 128 different filters.” ); and a planar engine circuit coupled to the plurality of neural engine circuits(Martin, para. 0052-0065, fig. 8, “The hardware implementation of a DNN 800 also comprise an element-wise operations module 806, an activation module 808, a normalisation module 810, a pooling module 812, an output interleave module 814 and an output module 815.” ), the planar engine circuit configured to generate a second output by: reducing a spatial size of a version of second input data received by the planar engine responsive to placing the planar engine circuit in a pooling mode, the second input data corresponding to the first output or a version of input data of the neural processor(Martin, para. 0052-0065, fig. 8, “The normalisation module 810 is configured to receive one of the following as input data:…the accumulation output (via the element-wise operations module 806) (e.g. when a convolution layer is processed in the current hardware pass and neither an element-wise layer nor an activation layer is processed in the current hardware pass)[the second input data corresponding to the first output or a version of input data of the neural processor]… [t]he pooling module 812 may receive the normalised data from the normalisation module 810…[t]he pooling module 812 is configured to perform a pooling function[of a version of second input data received by the planar engine responsive to placing the planar engine circuit in a pooling mode], such as, but not limited to, a max or mean function, on the received data to produce pooled data. The purpose of a pooling layer is to reduce the spatial size of the representation [reducing a spatial size] to reduce the number of parameters and computation in the network, and hence to also control overfitting. In some examples, the pooling operation is performed over a sliding window that is defined per pooling layer.”), and performing an elementwise operation on the second input data responsive to placing the planar engine circuit in an elementwise mode, the second input data corresponding to the first output or a version of input data of the neural processor(Martin, para. 0052-0065, fig. 8, “The element-wise operations module 806 is configured to receive…the input data…[from] the accumulated result from the accumulation buffer 306 (e.g. when a convolution layer is processed in the current hardware pass)[the second input data corresponding to the first output or a version of input data of the neural processor]…[w]hen the element-wise operations module 806 is configured to process the received input data the element-wise operations module 806 performs an element-wise operation on the received data (optionally with another data set (which may be obtained from external memory))[and performing an elementwise operation on the second input data responsive to placing the planar engine circuit in an elementwise mode]. The element-wise operations module 806 may be configured to perform any suitable element-wise operation such as, but not limited to add, multiply, maximum, and minimum.”).  
Regrading claim 2, Martin teaches the neural processor of claim 1, further comprising a data processor circuit coupled to the plurality of neural engine circuits and to the planar engine circuit, the data processing circuit configured to buffer the first output for sending the planar engine circuit or the second output for sending to the plurality of neural engines(Martin, para. 0027, see also fig. 3(306),“Since the convolution engine may not generate or produce outputs that relate to the same filter in consecutive cycles the partial results of one or more filters may be stored in an accumulation buffer 306[a data processor circuit coupled to the plurality of neural engine circuits and to the planar engine circuit] and then the appropriate partial result may be provided to the accumulator each cycle by the accumulation buffer 306. In some examples, the accumulation buffer 306 may be able to store partial results related to 128 different filters.” & see also Martin, para. 0053, “The accumulation buffer 306 also outputs the accumulated result to the element-wise operations module 806[the data processing circuit configured to buffer the first output for sending the planar engine circuit]….”).1  
Regrading claim 7, Martin teaches the neural processor of claim 1, wherein the convolution operation is one of a plurality of operations for implementing a machine learning model(Martin, para. 0020 see also fig. 2, “A DNN may comprise one or more convolution layers each of which is associated with a plurality of filters each of which comprises a plurality of weights. Each filter has a dimension m x n x P (i.e. each filter comprises a set of m x n x P weights w) and is applied to the input data according to a convolution operation across several steps in direction s and t, (which are referred to as windows)[ wherein the convolution operation is one of a plurality of operations] as illustrated in FIG 2. Each filter produces one output plane. The number of filters and the number of weights per filter may vary between convolution layers. A convolutional neural network (CNN), which is a specific type of DNN that is effective for image recognition and classification, generally comprises a plurality of convolution layers[for implementing a machine learning model].”).  
Regrading claim 11, Martin teaches the neural processor of claim 1, wherein the elementwise operation one or more of tensor addition, elementwise maximum, elementwise minimum, or elementwise multiplication(Martin, para. 0054, “The element-wise operations module 806 may be configured to perform any suitable element-wise operation such as, but not limited to add, multiply, maximum, and minimum.”).2  
Regrading claim 12, Martin teaches the neural processor of claim 1, wherein circuitry of the planar engine circuit is reconfigured when switched from the pooling mode to the elementwise mode(Martin, para. 0054, “The element-wise operations module 806 is configured to receive either the input data for the current hardware pass (e.g. when a convolution layer is not processed in the current hardware pass)[ circuitry of the planar engine circuit is reconfigured when switched from the pooling mode to the elementwise mode]…[t]he element-wise operations module 806 may be configured to perform any suitable element-wise operation such as, but not limited to add, multiply, maximum, and minimum.”).
 Referring to independent claim 13 it is rejected on the same basis as independent claim 1 since they are analogous claims.
Regrading claim 19, Martin teaches an electronic device, comprising: a memory storing a machine learning model(Martin, para. 0021, “A hardware implementation of a convolution layer may comprise a hardware module or block (which may be referred to herein as a convolution engine) that is configured to calculate the sum of the products between the weights
forming all or portion of a filter and input data values forming all or portion of a window (which may be referred to as a filter-window calculation)[a machine learning model]…[p]reparing each convolution engine to perform a filter-window calculation involves reading the appropriate input data and weights for each filter-window calculation from memory[a memory storing] and providing it to one of the convolution engines.”); and a neural processor, comprising: a plurality of neural engine circuits, at least one of the neural engine circuits configured to perform a convolution operation of first input data with one or more kernels to generate a first output(Martin, paras. 0023-0027, see also fig. 3,“ The hardware implementation 300 comprises a plurality of convolution engines 302[a plurality of neural engine circuits], a plurality of accumulators 304, an accumulation buffer 306, a coefficient buffer 308, and an input buffer 310. Each convolution engine 302 [at least one of the neural engine circuits] comprises hardware logic configured[configured to perform a convolution operation] to receive a set of weights {w1 ... w8}[with one or more kernels] that represent all or a portion of a filter, and a set of input data values {d1 .. d8}[ of first input data] that represent all or a portion of a window, and perform a multiply-accumulate calculation on the received weights and input data values… [e]ach accumulator 304 receives the output of one convolution engine 302[to generate a first output] and adds the output to the previous convolution engine output that relates to the same filter. Since the convolution engine may not generate or produce outputs that relate to the same filter in consecutive cycles the partial results of one or more filters may be stored in an accumulation buffer 306 and then the appropriate partial result may be provided to the accumulator each cycle by the accumulation buffer 306. In some examples, the accumulation buffer 306 may be able to store partial results related to 128 different filters.”); and a planar engine circuit coupled to the plurality of neural engine circuits(Martin, para. 0052-0065, fig. 8, “The hardware implementation of a DNN 800 also comprise an element-wise operations module 806, an activation module 808, a normalisation module 810, a pooling module 812, an output interleave module 814 and an output module 815.”), the planar engine circuit configured to generate a second output by: reducing a spatial size of a version of second input data received by the planar engine responsive to placing the planar engine circuit in a pooling mode, the second input data corresponding to the first output or a version of input data of the neural processor(Martin, para. 0052-0065, fig. 8, “The normalisation module 810 is configured to receive one of the following as input data:…the accumulation output (via the element-wise operations module 806) (e.g. when a convolution layer is processed in the current hardware pass and neither an element-wise layer nor an activation layer is processed in the current hardware pass)[the second input data corresponding to the first output or a version of input data of the neural processor]… [t]he pooling module 812 may receive the normalised data from the normalisation module 810…[t]he pooling module 812 is configured to perform a pooling function[of a version of second input data received by the planar engine responsive to placing the planar engine circuit in a pooling mode], such as, but not limited to, a max or mean function, on the received data to produce pooled data. The purpose of a pooling layer is to reduce the spatial size of the representation [reducing a spatial size] to reduce the number of parameters and computation in the network, and hence to also control overfitting. In some examples, the pooling operation is performed over a sliding window that is defined per pooling layer.”), and performing an elementwise operation on the second input data responsive to placing the planar engine circuit in an elementwise mode, the second input data corresponding to the first output or a version of input data of the neural processor(Martin, para. 0052-0065, fig. 8, “The element-wise operations module 806 is configured to receive…the input data…[from] the accumulated result from the accumulation buffer 306 (e.g. when a convolution layer is processed in the current hardware pass)[the second input data corresponding to the first output or a version of input data of the neural processor]…[w]hen the element-wise operations module 806 is configured to process the received input data the element-wise operations module 806 performs an element-wise operation on the received data (optionally with another data set (which may be obtained from external memory))[and performing an elementwise operation on the second input data responsive to placing the planar engine circuit in an elementwise mode]. The element-wise operations module 806 may be configured to perform any suitable element-wise operation such as, but not limited to add, multiply, maximum, and minimum.”).  
Regrading claim 20, Martin teaches the electronic device of claim 19, wherein the convolution operation is one of a plurality of operations for implementing a machine learning model(Martin, para. 0020 see also fig. 2, “A DNN may comprise one or more convolution layers each of which is associated with a plurality of filters each of which comprises a plurality of weights. Each filter has a dimension m x n x P (i.e. each filter comprises a set of m x n x P weights w) and is applied to the input data according to a convolution operation across several steps in direction s and t, (which are referred to as windows)[ wherein the convolution operation is one of a plurality of operations] as illustrated in FIG 2. Each filter produces one output plane. The number of filters and the number of weights per filter may vary between convolution layers. A convolutional neural network (CNN), which is a specific type of DNN that is effective for image recognition and classification, generally comprises a plurality of convolution layers[for implementing a machine learning model].”). 
 Referring to dependent claim 17 it is rejected on the same basis as
dependent claim 7 since they are analogous claims.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 3-6 and 14-16 are rejected under 35 U.S.C. 103 as being unpatentable over Martin et al. EP 3480745 A1 in view of Zhang et al. "Improving the performance of OpenCL-based FPGA accelerator for convolutional neural network." Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. (2017)(“Zhang”) .
Regrading claim 3, Martin teaches the neural processor of claim 1, but does not teach: wherein the planar engine circuit comprises:  a first filter circuit configured to, in the pooling mode, reduce a first size of a first dimension of the version of the second input data to generate an intermediate data, and a second filter circuit configured to, in the pooling mode, reduce a second size of a second dimension of the intermediate data to generate a version of the second output.  
However, Zhang teaches: a first filter circuit configured to, in the pooling mode, reduce a first size of a first dimension of the version of the second input data to generate an intermediate data, and a second filter circuit configured to, in the pooling mode, reduce a second size of a second dimension of the intermediate data to generate a version of the second output(Zhang, pg. 31, see also fig. 8, “The pooling layer outputs the average or the maximum value of a local area of the input feature map. Pooling layers can be expressed
as Equation (12),                         
                            o
                            u
                            t
                            
                                
                                    
                                        
                                            f
                                        
                                        
                                            o
                                        
                                    
                                    ,
                                     
                                    x
                                    ,
                                     
                                    y
                                
                            
                        
                                             
                            =
                        
                                                
                            
                                
                                    
                                        
                                            max
                                        
                                        
                                            0
                                            <
                                            
                                                
                                                    
                                                        
                                                            k
                                                        
                                                        
                                                            x
                                                        
                                                    
                                                    ,
                                                     
                                                    
                                                        
                                                            k
                                                        
                                                        
                                                            y
                                                        
                                                    
                                                
                                            
                                            <
                                            k
                                        
                                    
                                
                                ⁡
                                
                                    i
                                    n
                                    (
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                
                            
                        
                                               
                            x
                            +
                            
                                
                                    k
                                
                                
                                    x
                                
                            
                        
                    ,                         
                            y
                            +
                            
                                
                                    k
                                
                                
                                    y
                                
                            
                            )
                        
                      where k  We implement a similar line buffer as in Section 5.2.1, which uses the connections between different register stages to accomplish the window selection. In our design, we use a 4-input comparator to get the maximum value of a 2x2 window.” Zhang teaches:                         
                            x
                            +
                            
                                
                                    k
                                
                                
                                    x
                                
                            
                        
                    (a first filter circuit configured to, in the pooling mode, reduce a first size of a first dimension of the version of the second input data) line buffer and fig. 8 (to generate an intermediate data)                         
                            y
                            +
                            
                                
                                    k
                                
                                
                                    y
                                
                            
                        
                    (and a second filter circuit configured to, in the pooling mode, reduce a second size of a second dimension) line buffer and fig. 8 (of the intermediate data)                         
                            o
                            u
                            t
                            
                                
                                    
                                        
                                            f
                                        
                                        
                                            o
                                        
                                    
                                    ,
                                     
                                    x
                                    ,
                                     
                                    y
                                
                            
                        
                     (to generate a version of the second output)).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Martin with the above teachings of Zhang the motivation to do so would be to construct a FPGA accelerator for CNN classifiers (Zhang, pg. 25, “Convolutional Neural Networks (CNNs) are widely used in computer
vision, speech recognition, natural language processing and text classification. Over the past decade, the accuracy and the performance of CNN has improved significantly, mainly due to the enhanced neural network structures enabled by massive datasets and increased computational resources benefits from the CMOS scaling to train the models in reasonable time. In recent years, FPGA has become an attractive solution to accelerate CNN classification…[i]n this work, to achieve a high performance CNN accelerator, we first propose an analytic model to guide our kernel design to achieve a better mapping from OpenCL kernels to FPGA hardware.”)  
Regrading claim 4, Martin in view of Zhang teaches the neural processor of claim 3, wherein the planar engine circuit further comprises a line buffer circuit between first filter circuit and the second filter circuit, the line buffer circuit configured to store the intermediate data for sending to the second filter circuit(Zhang, pg. 31, see also figs.7 and 8, “As shown in Figure 8, we implement a line buffer...between local memory and external memory to flatten and rearrange data. The goal is to minimize the random data access penalty from external memory and to improve on-chip data reuse. The line buffer streams data from external memory which has a continuous address and converts it into the data order for 2D convolution… we fill the line buffer[a line buffer circuit between] using a ping-pong mechanism to pipeline the data access[first filter circuit] and computation [and the second filter circuit]. More specifically, we choose to fill the 256 memory locations at one time, as 256 is not only half of the Altera M20K memory depth but also the maximum data burst size of the DDR4 interface[the line buffer circuit configured to store the intermediate data for sending to the second filter circuit].”).  
Regrading claim 5, Martin in view of Zhang teaches the neural processor of claim 3, wherein at least one of the first filter circuit or the second filter circuit is configured to perform, in the elementwise mode, the elementwise operation on a version of the second input data(Martin, para. 0052-0065, fig. 8, “The element-wise operations module 806 is configured to receive either the input data for the current hardware pass (e.g. when a convolution layer is not processed in the current hardware pass) or the accumulated result from the accumulation buffer 306 (e.g. when a convolution layer is processed in the current hardware pass). The element-wise operations module 806 may either process the received input data or pass the received input data to another module…[w]hen the element-wise operations module 806 is configured to process the received input data the element-wise operations module 806 performs an element-wise operation on the received data (optionally with another data set (which may be obtained from external memory)). The element-wise operations module 806 may be configured to perform any suitable element-wise operation such as, but not limited to add, multiply, maximum, and minimum.”).3  
Regrading claim 6, Martin in view of Zhang teaches the neural processor of claim 3, wherein the planar engine circuit further comprises a format converter coupled to the first filter circuit, the format converter configured to perform one or more format conversions on the second input data to generate the version of the second input data(Zhang, pg. 31, see also fig. 7(a) and 8, “As shown in Figure 8, we implement a line buffer…between local memory and external memory to flatten and rearrange data. The goal is to minimize the random data access penalty from external memory and to improve on-chip data reuse. The line buffer  streams data from external memory which has a continuous address and converts it into the data order for 2D convolution [the format converter configured to perform one or more format conversions on the second input data to generate the version of the second input data]…we fill the line buffer using a ping-pong mechanism to pipeline the data access….[a format converter coupled to the first filter circuit].”).  
Referring to dependent claims 14-16 they are rejected on the same basis as
dependent claims 3-5 since they are analogous claims.

Claims 8-10 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Martin et al. EP 3480745 A1 in view of Bai et al. "A CNN accelerator on FPGA using depthwise separable convolution." IEEE Transactions on Circuits and Systems II: Express Briefs 65.10 (2018)(“ Bai”).
Regrading claim 8, Martin teaches, the neural processor of claim 1, but does not teach wherein the planar engine circuit is further configured to, in a reduction mode, reduce the rank of a tensor.
However, Bai teaches:  wherein the planar engine circuit is further configured to, in a reduction mode, reduce the rank of a tensor(Bai, pgs. 1417-1418, see also figs. 3,5, 9 and 10, “Pointwise convolution is actually standard convolution with kernel size 1 × 1 (Fig. 9). To fully take advantage of all the multipliers in MME, the input feature map is divided into several M × M × 32 sub-matrices, and these sub-matrices are shifted into line buffers one after another. This idea comes from divide and conquer algorithm in large matrix multiplication illustrated in Fig. 10, which consists in dividing large matrix into several small matrices [in a reduction mode, reduce the rank of a tensor].”). 
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Martin with the above teachings of Bai the motivation to do so would be to implement an accelerator that speeds up matrix multiplications and decreases memory latency(Bai, pg. 1415, “The key contributions of this brief are: (1) A high performance CNN hardware accelerator framework is proposed where all layers are processed in a computing unit named matrix multiplication engine. (2) The utilization of hierarchical memory structure and ping-pong on-chip buffer reduces the bandwidth limitation of off-chip memory.(3) A methodology for scalable design is proposed, so that this framework can be implemented in various FPGAs, through balancing the on-chip resources and performance.”).  
Regrading claim 9, Martin in view of Bai teaches the neural processor of claim 8, wherein the planar engine circuit comprises a filter circuit configured to: reduce the spatial size of the second data received in the pooling mode(Martin, para. 0052-0065, see also fig. 8, “The pooling module 812 may receive the normalised data from the normalisation module 810…[t]he pooling module 812 is configured to perform a pooling function, such as, but not limited to, a max or mean function, on the received data to produce pooled data. The purpose of a pooling layer is to reduce the spatial size of the representation to reduce the number of parameters and computation in the network, and hence to also control overfitting. In some examples, the pooling operation is performed over a sliding window that is defined per pooling layer.”), perform the elementwise operation of versions of one or more tensors in the elementwise mode(Martin, para. 0052-0065, see also fig. 8, “The element-wise operations module 806 may either process the received input data or pass the received input data to another module…[w]hen the element-wise operations module 806 is configured to process the received input data the element-wise operations module 806 performs an element-wise operation on the received data (optionally with another data set (which may be obtained from external memory)). The element-wise operations module 806 may be configured to perform any suitable element-wise operation such as, but not limited to add, multiply, maximum, and minimum.”), and generate the scalar value in the reduction mode(Bai, pgs. 1417-1418, see also figs. 3, 5, 9 and 10, For one MME, it is able to do M2 × 32 and 32 × 9 multiplication at once. The adder tree sums up the 32 products in each cell as revealed by Fig. 9. Thus the output channel number is 9 [and generate the scalar value in the reduction mode].).
Regrading claim 10, Martin teaches the neural processor of claim 1, but does not teach: wherein the first input data represent data across a plurality of channels and the second input data represents data in one of the channels.
However, Bai teaches: wherein the first input data represent data across a plurality of channels(Bai, pg. 1417, “To avoid losing too much information, standard convolution is adopted to do the first layer convolution. Therefore, this accelerator is adapted to be able
to do the standard convolution with input feature map channel is 3 [the first input data represent data across a plurality of channels].”) and the second input data represents data in one of the channels(Bai, pg. 1418, “Average pooling and max pooling are treated differently. As pixels of a feature map channel are output one by one [and the second input data represents data in one of the channels], average pooling could be easily calculated by adding one more multiply-accumulate stage by a factor of 1/S, where S is average pooling size. On the other hand, max pooling needs one more comparison stage.”). 
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Martin with the above teachings of Bai the motivation to do so would be to implement an accelerator that speeds up matrix multiplications and decreases memory latency(Bai, pg. 1415, “The key contributions of this brief are: (1) A high performance CNN hardware accelerator framework is proposed where all layers are processed in a computing unit named matrix multiplication engine. (2) The utilization of hierarchical memory structure and ping-pong on-chip buffer reduces the bandwidth limitation of off-chip memory.(3) A methodology for scalable design is proposed, so that this framework can be implemented in various FPGAs, through balancing the on-chip resources and performance.”).  
Referring to dependent claim 18 it is rejected on the same basis as
dependent claims 8 since they are analogous claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US 9836691 B1(details an instruction set that specifies data values for performing a tensor computation for a CNN)
US 10037490 B2(details dividing a tensor by a number of elements in the window to do average pooling for a CNN)
US 20190266485 A1(details a reconfigurable stream switch formed in the integrated circuit, and an arithmetic unit coupled to the reconfigurable stream switch for a CNN accelerator)
US 20180285715 A1(determines a loading space unit in an input based on a height or a width for an input feature map of the input and an extent of a dimension of a kernel feature map)

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Adam Clark Standke whose telephone number is (571)270-1806. The examiner can normally be reached 10AM-7PM M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



Adam Clark Standke
Assistant Examiner
Art Unit 2129
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.
        2 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.
        3 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.