DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner’s Notes
Regarding the 35 USC § 112(a) rejection, the rejection made in the previous action have been withdrawn. 
Allowable Subject Matter
Claims 1, 3-11, and 13-20 are allowed.

Reasons for Allowance
The following is an examiner's statement of reasons for allowance:
Claims 1, 3-11 and 13-20 are considered allowable since when reading the claims in light of the specification, as per MPEP 2111.01, none of the references of record alone or in combination disclose or suggest the limitations found within the independent claims 1 and 11 as a whole with regards to technical features recited by the claim limitations as highlighted in exemplary claim 1 limitations, directed to: “wherein the input data include a plurality of three-dimensional samples in the neural network, each three-dimensional sample including first and second dimensions along which an operation window slides for performing the pooling operation and a third dimension defining a depth of the operation window, wherein data in the third dimension are not stored adjacently and consecutively in a memory device of the processing device; rearranging, by the processing device, an order of dimensions of the input data such that data in the third dimension are stored adjacently and consecutively in the memory device; and performing, by the processing device, the pooling operation on the rearranged input data” (exemplary claim 1)

The closest prior arts, listed below, discloses:
Liu et al. (NPL: “Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks”): teaches hardware architectures and software parallel operations for process pooling and other  operations in a convolutional neural network using a three dimensional arrangement of information. 

Albericio et al. (NPL: “Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing”): teaches the use of three dimensional dataflow where the third dimension defines an offset in which the neurons are arranged for processing the neuron configurations as 2-D slices processed in cycles.  For each processing cycle, one neuron per slice is fetched resulting into a group of 16 neurons one per lane. For example, let e(x, y,z) be the (neuron,offset) pair stored at location (x,y,z) of an input array. In cycle 0, the encoded neurons at position e(0,0,0), e(0,0,16), ..., e(0,0,240) will be fetched and broadcast to all units and processed by neuron lanes 0 through 15, respectively. Mainly, Albericio teaches the model and data parallelization used for processing the corresponding data volumes using parallel computing architectures and software operations.

Shen et al. (NPL: “Maximizing CNN Accelerator Efficiency Through Resource Partitioning”): teaches the use of dataflow algorithms for distributing data volumes as two dimensional arrangement that can vary in the third dimension for assigning each layer volume of data where the layer size varies and effects the number of processes used for computing each layer of operations.

Sharma et al. (NPL: “From High-Level Deep Neural Models to FPGAs”): teaches the use of dataflow algorithms partition data using the dimensions of the slice in a particular layer and an algorithm that takes in as input the DNN macro dataflow graph (D) and the constraints of the FPGA platform. The algorithm also computes the N number of basic processing elements per processing units to estimate the number of cycles to execute a particular DNN with an organization that complies with N and determined slice size for a particular layer.

Martinez-Canales et al. (US Pub. No. 2020/0117993): teaches use of 3-D tensor data arrangements for computing neural network operations where each processing engine has a layer specific number of kernels. The 2-dimensional convolutional layer on the images can be processed with Kernels that have different heights and weights for each kernel or within a tensor implementation.

Vorbach et al. (US Pub. No. 2016/0048394): teaches processing parallel instructions using multiple execution units such that the results data are continuously written to the load/store unit for storing in memory. 

Vasudevan et al. (NPL: “Parallel multi channel convolution using general matrix multiplication”) teaches rearranging the dataflow pattern such the memory allocation is consecutively rearranged


In summary, the references made of record, fail to disclose the required claimed technical features as recited by the claims 1 limitations noted above. Therefore claim 11 is considered allowable.

The references made of record, fail to disclose the required claimed technical features as recited by the independent claim limitations as a whole, see remarks filed 08/26/2022.
Furthermore, the references of record alone or in combination disclose or suggest the combination of limitations found within the independent claims as a whole without hindsight reasoning.
The dependent claims, being further limiting to the independent claim(s), definite, and enable by the Specification are also allowed. 
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled "Comments on Statement of Reasons for Allowance."


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure are listed below:
Cadambi et al. (US Pub. No. 2011/0119467): teaches the use of software processes that perform product operations over distributed processing circuits as chained parallel processes for implementing dataflow matrix operation for processing the volumetric data blocks associated with the convolution operations being processed in parallel.
Tarditi et al. (NPL: “Accelerator: Using Data Parallelism to Program GPUs for General-Purpose Uses”): teaches use of the accelerator to select data arbitrarily integer valued data parallel arrays. 

Nowatzyk et al. (US Pub. No. 2019/0065937): teaches the use of three-dimensionally stacked neural network accelerator includes more than two dies stacked together. Each die includes a plurality of tiles on both dies process data to perform neural network computations according to the dataflow configuration of data propagation through the tiles. Also teaches a dataflow configurations for processing with the 3-D neural network configuration.

Ankit et al. (NPL: “A reconfigurable and energy-efficient architecture with memristive crossbars for deep spiking neural networks”): teaches the arranging processing cells for using 2-dimensionally arranged processing engines having reconfigurable hierarchies in the third dimension because the mPE’s reconfigurability enables optimized MCA utilization for sparse connectivity.  The basic processing NeuroCell (NC) using data flow algorithms that enables mapping of an neural network architecture (NN) where each NC in the NC-array is associated with a “tag (x, y)” which facilitates input broadcast from the SRAM to a variable number of NCs (that map to a given layer) within a single cycle. The dataflow process involved across hierarchies for NN computation and within an NC, parallel data transfer occurs between layers of the NN through the switch network. Data transfer occurs serially through the shared bus between layers mapped across multiple NCs to compute the final output.

Brothers et al. (US Pub. No. 2016/0350645):  teaches distributing weights among a set of processing compute notes where the same weights can be processed for a given feature map and different weights are stored when processing a current layer of the neural network model. And loading weights to the processing units based on different distribution patterns based on memory size and computation layer.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLUWATOSIN ALABI whose telephone number is (571)272-0516. The examiner can normally be reached Monday-Friday, 8:00am-5:00pm EST..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/O.O.A./Examiner, Art Unit 2129                                                                                                                                                                                                        
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129