Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Reasons for Allowance
The following is an examiner’s statement of reasons for allowance: Claim 1 requires among other things:  “A circuit arrangement, …MAC circuits are configured to: receive a first set of data elements of an input feature map (IFM) at a first rate; … perform first MAC operations on the first set of the data elements and a first one of the kernels (H) associated with a first output feature map (OFM) depth index (d2) during a first MAC cycle, wherein a rate of MAC cycles is faster than the first rate; and perform second MAC operations on the first set of the data elements and a second one of the kernels (H) associated with a second OFM depth index (d2) during a second MAC cycle that consecutively follows the first MAC cycle.” The closest prior art includes Aydonat (patent application publication No. 2017/0103299)  and Wang et al., (IEEE paper entitled “PipeCNN: A OpenCL-Based Open-Source  FPGA  Accelerator for Convolution Neural Networks).
Aydonat taught convolutional network including input feature map (310) for regions (330)(e.g., see fig. 3); and processing elements including cache  and including  processing unit (e.g., see fig. 10) and accepting input image (input feature map) and  sequencing, addressing  and delivery of data to each PE array, kernels in each PE array, and components in each of the kernels (e.g.,. see paragraph 0059-0061). 
Wang taught  convolutional kernel is essentially  a 3-dimensional multiply-accumulate (MAC) operation; and  transfer feature map and weight data from/to the global memory(i.e., external DDR memory) feeding other kernels, Nested loops can be avoided with kernel code,  and an convolutional pipeline structure consisting  of multiplier tree with a delayed buffer. When the appropriate depth is selected, the proposed structure can be pipelined by the OpenCL with an initial interval of only one cycle. The compute pipeline constitutes a compute unit (CU) and the kernel consists of multiple CUs to perform parallel convolutions and a batch of  classification tasks can be processed by a single kernel  by mapping all input feature maps as a single 3-D data set (e.g., see section II, Architecture Design and Optimization on pages 279-280).  However Aydonat and Wang did not  disclose  “A circuit arrangement, … the MAC circuits are configured to: receive a first set of data elements of an input feature map (IFM) at a first rate; perform first MAC operations on the first set of the data elements and a first one of the kernels (H) associated with a first output feature map (OFM) depth index (d2) during a first MAC cycle, wherein a rate of MAC cycles is faster than the first rate; and perform second MAC operations on the first set of the data elements and a second one of the kernels (H) associated with a second OFM depth index (d2) during a second MAC cycle that consecutively follows the first MAC cycle.”
The other independent claim recites similar limitations.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Redfern (patent No. 11,086,967) disclosed implementing fundamental computational primitives using a matrix multiplication accelerator (e.g., see abstract).
Merrill (patent application publication No. 2016/0210550) disclosed cloud-based neural networks (e.g., see abstract).
Ma, Y, et al., Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA, 2016, IEEE. 
Palero, Ricardo Colom et al., A Novel FPGA Architecture of a 2-D Wavelet Transform, 2006, Springer Science  and Business Media, Inc. pp.273-284.





	Gray, S. et al., GPU Kernels for Block Sparse 2017, Weights,openai.com, 12 pages.
	
Moons, B. et al., An Energy-Efficient Precision-Scalable ConvNet Processor in 

40-nm CMOS  2017, IEEE, IEEE Journal of Solid State Circuits Vol. 52, No. 4, 12 pages. 

	

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC COLEMAN whose telephone number is (571)272-4163. The examiner can normally be reached M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 0-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

ERIC . COLEMAN
Primary Examiner
Art Unit 2183



/ERIC COLEMAN/Primary Examiner, Art Unit 2183