DETAILED ACTION
In view of the Appeal Brief filed on 12/07/2021, PROSECUTION IS HEREBY REOPENED. A Notice of Allowance has set forth below.
To avoid abandonment of the application, appellant must exercise one of the following two options:
(1) file a reply under 37 CFR 1.111 (if this Office action is non-final) or a reply under 37 CFR 1.113 (if this Office action is final); or,
(2) initiate a new appeal by filing a notice of appeal under 37 CFR 41.31 followed by an appeal brief under 37 CFR 41.37. The previously paid notice of appeal fee and appeal brief fee can be applied to the new appeal. If, however, the appeal fees set forth in 37 CFR 41.20 have been increased since they were previously paid, then appellant must pay the difference between the increased fees and the amount previously paid.
A Supervisory Patent Examiner (SPE) has approved of reopening prosecution by signing below:
/MICHAEL J HUNTLEY/               Supervisory Patent Examiner, Art Unit 2129                                                                                                                                                                                         

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Examiner’s Notes
Regarding the 35 USC § 103 rejection the rejection made in the previous action has been withdrawn. 
Allowable Subject Matter
Claims 1 and 5-19 are allowable.
EXAMINER'S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.

Authorization for this examiner’s amendment was by Gang Ye (Reg. No. 69,585) on February 2, 2022.
Replace claims 1-20 with the following:
1.	(Currently Amended)  A processing device for performing operations in a neural network, the processing device comprising:
a hardware main processing integrated circuit;
a plurality of hardware basic processing integrated circuits that are separate from the main processing integrated circuit; and
a plurality of hardware branch processing circuits, wherein each of the plurality of hardware branch processing circuits connects the main processing integrated circuit to a distinct subset of the plurality of hardware basic processing integrated circuits, each distinct subset comprising multiple hardware basic processing integrated circuits directly connected to the corresponding hardware branch processing circuit;
wherein: 
the hardware main processing integrated circuit is configured to:

split a first data set into a plurality of distinct basic data blocks;
distribute the plurality of distinct basic data blocks to the plurality of hardware basic processing integrated circuits through the plurality of hardware branch processing circuits, wherein each of the plurality of distinct basic data blocks is distributed to one of the plurality of hardware basic processing integrated circuits and at least two hardware basic processing integrated circuits receive different basic data blocks; 
identify a broadcast data block from a second data set; and
broadcast the broadcast data block to the plurality of hardware basic processing integrated circuits through the plurality of hardware branch processing circuits, wherein each of the plurality of hardware basic processing integrated circuits receive the same broadcast data block;
each of the plurality of hardware branch processing circuits is configured to forward the broadcast data block or at least one of the plurality of distinct basic data blocks received from 
each of the plurality of hardware basic processing integrated circuits is configured to:
receive a corresponding basic data block distributed by the hardware main processing integrated circuit and forwarded from the connected hardware branch processing unit, wherein different hardware basic processing integrated circuits receive different basic data blocks; 
receive the broadcast data block broadcasted by the hardware main processing integrated circuit and forwarded from the connected hardware branch processing circuit, wherein each hardware basic processing integrated circuit receives the same broadcast data block;
perform an inner-product operation between the corresponding basic data block and the broadcast data block received by that hardware basic processing integrated circuit; and

the plurality of hardware basic processing integrated circuits perform the respective inner-product operations in parallel;
each of the plurality of hardware branch processing circuits is configured to forward the operation results returned from the distinct subset of the plurality of hardware basic processing integrated circuit connected thereto to the hardware main processing integrated circuit; and
the hardware main processing integrated circuit is configured to perform a set of arithmetic operations in series on the operation results forwarded from the plurality of hardware branch processing circuits. 

2.	(Cancelled)  















3.	(Cancelled)  





4.	(Cancelled)  

5.	(Currently Amended)  The processing device of claim [[2]] 1, wherein each of the plurality of hardware basic processing integrated circuits is configured to:
obtain an inner-product operation result by performing [[an]] the inner-product operation between the corresponding basic data block and the broadcast data block; and
obtain the operation result by performing an accumulation operation of the inner-product operation result.

6.	(Currently Amended)  The processing device of claim [[3]] 1, wherein the hardware main processing integrated circuit is configured to:
obtain an accumulated result by performing an accumulation operation of the 
obtain an instruction result corresponding to [[the]] an operation instruction by arranging the accumulated results.

7.	(Currently Amended)  The processing device of claim [[2]] 1, wherein the hardware main processing integrated circuit is configured to:

broadcast the plurality of broadcast data sub-blocks to the plurality of hardware basic processing integrated circuits through multiple broadcasts, wherein each broadcast transmits a same broadcast data sub-block to each of the plurality of hardware basic processing integrated circuits.

8.	(Previously Presented)  The processing device of claim 7, wherein each of the plurality of hardware basic processing integrated circuits is configured to:
obtain an inner-product operation result by performing an inner-product operation between each broadcast data sub-block and the respective basic data block; 
obtain an operation sub-result by performing an accumulation operation of the inner-product operation result; and
return the operation sub-result to the hardware main processing integrated circuit.

9.	(Currently Amended)  The processing device of claim 8, wherein each of the plurality of hardware basic processing integrated circuits is configured to:
obtain n processing sub-results by multiplexing each of the broadcast data sub-blocks n times and perform inner-product operations between the broadcast data sub-blocks and n basic data blocks;
obtain [[an]] n operation sub-results by performing accumulation operations of the n processing sub-results, respectively; and
return the n operation sub-results to the hardware main processing integrated circuit, wherein n is an integer greater than or equal to 2.


the hardware main processing integrated circuit comprises at least one of a main register or a main on-chip cache circuit; and 
each of the plurality of hardware basic processing integrated circuits comprises at least one of a basic register or a basic on-chip cache circuit.

11.	(Previously Presented)  The processing device of claim 10, wherein the hardware main processing integrated circuit comprises at least one of a vector arithmetic unit circuit, an arithmetic logic unit circuit, an accumulator circuit, a matrix transpose circuit, a direct memory access circuit, or a data rearrangement circuit. 

12.	(Previously Presented)  The processing device of claim 10, wherein each of the hardware basic processing integrated circuits further comprises at least one of an inner-product arithmetic unit circuit or an accumulator circuit.

13.	(Previously Presented)  The processing device of claim 1, wherein the hardware main processing integrated circuit is connected with each of the plurality of hardware branch processing circuits, and the plurality of hardware branch processing circuits are not connected to one another.

14.	(Previously Presented)  The processing device of claim 1, wherein the plurality of hardware branch processing circuits are connected in series and at least one of the hardware branch processing circuits is connected to the hardware main processing integrated circuit.

15.	(Currently Amended)  The processing device of claim 13, wherein the plurality of hardware branch 

16.	(Currently Amended)  The processing device of claim 14, wherein at least one of the plurality of hardware branch processing circuits is configured to forward [[the]] data transmitted by the hardware main processing integrated circuit to another one of the plurality of hardware branch processing circuits connected thereto.

17.	(Currently Amended)  The processing device of claim 1, wherein the hardware main processing integrated circuit is configured to transmit to at least one of the plurality of hardware branch processing circuits at least one of a vector, a matrix, a three-dimensional data block, a four-dimensional data block, or an n-dimensional data block.

18.	(Currently Amended)  The processing device of claim [[2]] 1, wherein:
the broadcast data block is used as a multiplier data block and the plurality of distinct basic data blocks are collectively used as a multiplicand data block, when the operations include a multiplication operation; and
the broadcast data block is used as an input data block and the plurality of distinct basic data blocks are collectively used as a convolution kernel, when the operations include a convolution operation. 

19.	(Currently Amended)  A method, implemented by a processing device, for performing operations in a neural network, the processing device comprising a hardware main processing integrated circuit, a plurality of hardware branch processing circuits, and a plurality of hardware basic processing integrated 
splitting, by the hardware main processing integrated circuit, a first data set into a plurality of distinct basic data blocks;
distributing, by the hardware main processing integrated circuit, the plurality of distinct basic data blocks to the plurality of hardware basic processing integrated circuits through the plurality of hardware branch processing circuits, wherein each of the plurality of distinct basic data blocks is distributed to one of the plurality of hardware basic processing integrated circuits and at least two hardware basic processing integrated circuits receive different basic data blocks;
identifying, by the hardware main processing integrated circuit, a broadcast data block from a second data set;
broadcasting, by the hardware main processing integrated circuit, the broadcast data block to the plurality of hardware basic processing integrated circuits through the plurality of hardware branch processing circuits, wherein each of the plurality of hardware basic processing integrated circuits receive the same broadcast data block;

forwarding, by each of the plurality of hardware branch processing circuits, the broadcast data block or at least one of the plurality of distinct basic data blocks received from 
receiving, by each of the plurality of hardware basic processing integrated circuits, a corresponding basic data block distributed by the hardware main processing integrated circuit and forwarded from the connected hardware branch processing unit, wherein different hardware basic processing integrated circuits receive different basic data blocks; 
receiving, by each of the plurality of hardware basic processing integrated circuits, the broadcast data block broadcasted by the hardware main processing integrated circuit and forwarded from the connected hardware branch processing circuit, wherein each hardware basic processing integrated circuit receives the same broadcast data block;
performing, by each of the plurality of hardware basic processing integrated circuits, an inner-product operation between the corresponding basic data block and the broadcast data block inner-product operations in parallel;
returning, by each of the plurality of hardware basic processing integrated circuits, an operation result to the connected hardware branch 
forwarding, by each of the plurality of hardware branch processing circuits, the operation results returned from the distinct subset of the plurality of hardware basic processing integrated circuit connected thereto to the hardware main processing integrated circuit; and
performing, by the hardware main processing integrated circuit, a set of arithmetic operations in the neural network in series on the operation results forwarded from the plurality of hardware branch processing circuits.








Reasons for Allowance
The following is an examiner's statement of reasons for allowance:
Claims  1 and 5-19 are considered allowable since when reading the claims in light of the specification, as per MPEP 2111.01, none of the references of record alone or in combination disclose or suggest the limitations found within the independent claims 1 and 19 as a whole with regards to technical features recited by the claim limitations including, as highlighted in exemplary claim 1 limitations, directed to:
“split a first data set into a plurality of distinct basic data blocks; distribute the plurality of distinct basic data blocks to the plurality of hardware basic processing integrated circuits through the plurality of hardware branch processing circuits, wherein each of the plurality of distinct basic data blocks is distributed to one of the plurality of hardware basic processing integrated circuits and at least two hardware basic processing integrated circuits receive different basic data blocks; identify a broadcast data block from a second data set; and broadcast the broadcast data block to the plurality of hardware basic processing integrated circuits through the plurality of hardware branch processing circuits, wherein each of the plurality of hardware basic processing integrated circuits receive the same broadcast data block; each of the plurality of hardware branch processing circuits is configured to broadcast data block or at least one of the plurality of distinct basic data blocks received from transmitted by the hardware main processing integrated circuit to the distinct subset of the plurality of hardware basic processing integrated circuits connected thereto; each of the plurality of hardware basic processing integrated circuits is configured to: receive a first set of data corresponding basic data block distributed by the hardware main processing integrated circuit and forwarded from the connected hardware branch processing unit, wherein different hardware basic processing integrated circuits receive different first set of data basic data blocks;  receive a second set of data the broadcast data block broadcasted by the hardware main processing integrated circuit and forwarded from the connected hardware branch processing circuit, wherein each hardware basic processing integrated circuit receives the same second set of data broadcast data block; perform a set of operations an inner-product operation on the first and second sets of data between the corresponding basic data block and the broadcast data block received by that hardware basic processing integrated circuit; and return an operation result to the connected hardware branch processing circuit; the plurality of hardware basic processing integrated circuits perform the respective sets of inner-product operations in parallel; each of the plurality of hardware branch processing circuits is configured to forward the operation results returned from the distinct subset of the plurality of hardware basic processing integrated circuit connected thereto to the hardware main processing integrated circuit; and the hardware main processing integrated circuit is configured to perform a set of arithmetic operations in series on the operation results forwarded from the plurality of hardware branch processing circuits. ” (in exemplar claim 1), as recited by claims the independent claim limitations. 
 
The closest prior arts, listed below, discloses:
Valentine et al. (US Pub. No. 20190339972): teaches use of distributed hardware for use of scheduling task related operations associated computing operations associated with a automated task.
Cai (US Pub. No. 2018/0121240): teaches use of distributed hardware for use of scheduling task related operations associated computing operations associated with a automated task.
Azarkhish et al. (NPL: “Neurostream: Scalable and Energy Efficient Deep Learning with Smart Memory Cubes”): teaches the use of scalable reconfigurable hardware for processing 
Ngo (NPL: "FPGA Hardware Acceleration of Inception Style Parameter Reduced Convolution Neural Networks"): teaches the use of hardware and software integrated systems in convolutional neural network performance for processing data flow and learning model as volumetric processes and transformations denoted as kernel dimensions distributed as mappings of the algorithm calculate the inner-product of the convolution operations to achieve higher computation power.
Moini et al. (NPL: "A resource-limited hardware accelerator for convolutional neural networks in embedded vision applications"): teaches the using a configurable architecture for accelerating convolution stages in convolutional neural networks (CNNs) to optimize resource usage and power consumptions to exploit the inherent parallelism  in CNNs using volumetric distributed data flow process for decomposing information into data blocks for implementing product and series  operations at each convolution stage of CNN operations. 
Lu et al. (US Pat. No.  10,073, 816): teaches the use of software processes that perform product operations over distributed processing circuits as outer products based on the dataflow information patterns for processing the volumetric data blocks associated with the convolution operations being processed in parallel.
Genov et al. (US Pub. No. 20050125477): teaches use of distributed hardware for use of processing inner products using dynamic memory architecture using bit representation. 
 
In summary, the references made of record, fail to disclose the required claimed technical features as recited for performing the inner-product output result in parallel using the distribution of 

Furthermore, the references of record alone or in combination fail to disclose or suggest the combination of limitations found within the independent claims as a whole without hindsight reasoning.
The dependent claims, being further limiting to the independent claims, definite, and enable by the Specification are also allowed. 
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled "Comments on Statement of Reasons for Allowance."


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Cadambi et al. (US Pub. No. 2011/0119467): teaches the use of software processes that perform product operations over distributed processing circuits as chained parallel processes for implementing dataflow matrix operation for processing the volumetric data blocks associated with the convolution operations being processed in parallel.
Greer (US 20080152217): standard operations of a neural network is to perform an inner-product operation of a node with a weight value. 
Chang et al (US 20180173571): teaches a dataflow is an arrangement of data in a processing system and an inner-product is a mathematical operation that are used as part of performing operations associated with a convolutional neural network.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLUWATOSIN ALABI whose telephone number is (571)272-0516. The examiner can normally be reached Monday-Friday, 8:00am-5:00pm EST..

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/O.O.A./Examiner, Art Unit 2129                                                                                                                                                                                                        
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129