DETAILED ACTION
Continued Examination Under 37 CFR 1.114
The request for a continued prosecution application (CPA) under 37 CFR 1.53(d) filed on 11/11/2021 is acknowledged. A CPA may only be filed in a design application filed under 35 U.S.C. chapter 16.  See 37 CFR 1.53(d)(1). Since a CPA of this application is not permitted under 37 CFR 1.53(d)(1), the improper request for a CPA is being treated as a request for continued examination of this application under 37 CFR 1.114.

Status of the Claims
This action is in response to the applicant amendment filed on 11/11/2021 for application 15/981,679 filed on 5/16/2018. Claims 1, 3 – 5, 9, 12, 14 and 17 are amended. Claim 2 is canceled. Claims 1 and 3 – 20 are pending and have been examined.

The claim rejection of 1, 3 - 20 base on 112(a) has been withdrawn in light of the amendment to the claim. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 6/21/2021, 7/23/2021 and 11/11/2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 1, 4 – 5, 7 – 8 and 13 – 17 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Larson US5,659,781 Bidirectional Systolic Ring Network, 1997, in view of Ge et al., US8824603, Bi-Directional Ring-Bus Architecture for Cordic Based Matrix Inversion, 2014. 

Regarding Claim 1, Larson teaches:

a first arrangement of continuously connected processing units including at least first, second, third, and fourth processing units, wherein the first and second processing units are connected to systolically pulse data to one another, and wherein the third and fourth processing units are connected to systolically pulse data to one another (See at least Larson, Col 14, Ln. 15 – 25 where nodes are grouped; Fig. 2, where node 0 – 31 are continuously connected to form a group [first arrangement of sequentially connected processing group], 30 [1st processing unit] and 31 [2nd processing unit] are connected, 0 [3rd processing unit] and 1 [4th processing unit] are connected; Col 17, Ln. 1, where in linear shift process, data are pulsed to its adjacent node)
a second arrangement of continuously connected processing units including at least fifth, sixth, seventh, and eighth processing units, wherein the fifth and sixth processing units are connected to systolically pulse data to one another, and wherein the seventh and eighth processing units are connected to systolically pulse data to one another (See at least Larson, Fig. 2, where node 32 – 63 are continuously connected and form a group [second arrangement of sequentially connected processing units], 62 [5th  processing unit] and 63 [6th processing unit] are connected, 32 [7th processing unit] and 33 [8th processing unit] are connected; Col 17, Ln. 1, where in linear shift process, data are pulsed to its adjacent node);
a first interconnect between the second and seventh processing units, wherein the second processing unit is configured to systolically pulse data to the seventh processing unit along the first interconnect (see at least Larson Fig. 3A, where connection [first interconnect] between 31[2nd processing unit] and 32[7th processing unit])
a second interconnect between the third and sixth processing units, wherein the third processing unit is configured to systolically pulse data to the sixth processing unit along the second interconnect (see at least Larson Fig. 3A, where connection [second interconnect] between 0[3rd processing unit] and 63[6th processing unit]) and 
wherein the first and second interconnects form a first pair of interconnects (See at least Larson Fig. 2, where the connection [first interconnect] between 31 and 32 and the connection [second interconnect] between 0 and 63 are pair of connection between two node groups), 
Larson does not explicitly discloses: 
wherein multiple pairs of interconnects connect the first arrangement of continuously connected processing units to the second arrangement of continuously connected processing units 
Ge explicitly discloses: 
wherein multiple pairs of interconnects connect the first arrangement of sequentially connected processing units to the second arrangement of sequentially connected processing units (See at least Ge, Fig. 3A, where PE5 – PE12 form the first arrangement, PE13 – PE16 and PE1 to PE4 form the second arrangement, with dual ring, dual connection between PE4 [sixth processing unit] and PE5 [third processing unit] and PE12 [second processing unit] and PE13 [seventh processing unit] provide two pair of interconnects [multiple pairs of interconnects] between the first arrangement and second arrangement).
Larson and Ge both teach systolic ring bus among multiple processing units are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Larson’s teaching of ring bus architecture with Ge’s teaching of bi-directional dual ring bus to achieve the claimed teaching. One of the ordinary skilled in the art would have motivated to make this modification in order to reduce latency of processing (See at least Ge, Col. 9, ln. 19 – 20).

Regarding Claim 4, depending on Claim 1, Larson further teaches:
further comprising a second pair of interconnects, the second pair of interconnects including a third interconnect between an uppermost processing unit in the first arrangement and an uppermost processing unit in the second arrangement and a fourth interconnect between a lowermost processing unit in the first arrangement and a lowermost processing unit in the second arrangement. (See at least Larson Fig. 3A, where the second pair of interconnects using the example numbering, the connection [third interconnect] between 0 [uppermost processing unit in first arrangement] and 63 [uppermost processing unit in second arrangement] and the connection [fourth interconnect] between 31 [lowermost processing unit in first arrangement] and 32 [lowermost processing unit in second arrangement] are the pair of connection between group 1 [first arrangement] and group 2 [second arrangement])

Regarding Claim 5, Larson in view of Ge teach a device of Claim 1, Larson in view of Ge further teach:
wherein, during a signal systolic pulse, the sixth processing unit is configured to receive a first piece of data from the third processing unit and a second piece of data from the fifth processing unit. (See at least Ge, Fig. 3A, where in a systolic pulse, PE4 [sixth processing unit] receive data [a first piece of data] from PE5 [third processing unit] in the counter-clockwise ring and data [second piece of data] from PE3 [fifth processing unit] in the clockwise ring; In Fig. 3A, PE5 – PE12 form the first arrangement, PE13 – PE16 and PE1 to PE4 form the second arrangement. The connection between PE5 [third processing unit] and PE4 [sixth processing unit] is the second interconnect).

Regarding Claim 7, depending on Claim 1, Larson in view of Ge further teach: 
wherein the second processing unit includes an output systolic element configured to tag an activation output generated by the second processing unit with an identifier, wherein the identifier indicates an address for the second processing unit (See at least Ge. Fig. 3C & Col. 7, ln. 32, where dock 362 [output systolic element] is responsible to initiate token with PE address [identifier indicates an address for the unit]).

Regarding Claim 8, depending on Claim 7, Larson in view of Ge teach the device of Claim 7. Larson in view of Ge further teach: 
wherein the activation output including the tag is systolically pulsed to an input systolic element of the seventh processing unit (See at least Ge. Fig. 3C, where the output token [activation output including tag] pulse on the clockwise ring to the dock 362 [input systolic element] of the next clockwise processing unit [7th processing unit]).

Regarding Claim 13, depending on Claim 1, Larson in view of Ge further teach: 
wherein the first processing unit includes an input systolic element configured to receive data (See at least Ge, Fig. 3C, where Dock 362 [input systolic element]), a first processing circuit configured to perform processing of the received data to generate a first activation output (See at least Ge. Fig. 3C, where PE 355 [first processing unit] receive, processing, and output data [first activation output]), a first output systolic element, and a data tagger configured to tag the first activation output with an address of the first processing unit (See at least Ge. Fig. 3C & Col. 7, ln. 32, where dock 362 [output systolic element and data tagger] is responsible to initiate token with PE address [identifier indicates an address for the unit]).

Regarding Claim 14, Larson teaches: 
A method for systolically processing data (See at least Larson, Col. 1, Ln. 30 – 34, where systolic data transfers) according to a neural network comprising at least a first layer and a second layer, the method comprising: 
during a first systolic clock cycle, performing a first set of systolic pulses of data through at least first, second, third, and fourth processing units arranged along a first arrangement of continuously connected processing unit (See at least Larson Fig. 2 &  Col. 17, Ln. 1 – 10, where in a major phase [clock cycle] the clockwise bus perform the following shift 30 [1st processing unit] to 31 [2nd processing unit] and 0 [3rd processing unit] to 1 [4th processing unit] in the group [first arrangement]) and at least fifth, sixth, seventh, and eighth processing units arranged along a second arrangement of continuously connected processing units (See at least Larson Fig. 2 &  Col. 17, Ln. 1 – 10, where in a major phase [clock cycle] the clockwise bus perform the following shift 62 [5th processing unit] to 63 [6th processing unit] and 32 [7th processing unit] to 33 [8th processing unit] in the group [second arrangement]), the first set of systolic pulses including
systolically pulsing data from the first processing unit of the first arrangement to the second processing unit of the first arrangement (See at least Larson Fig. 3A &  Col. 17, Ln. 1 – 10, where in a major phase [clock cycle] the clockwise bus perform the following shift 30 [1st processing unit] to 31 [2nd processing unit]);
systolically pulsing data from the third processing unit of the first arrangement to the fourth processing unit of the first arrangement (See at least Larson Fig. 3A &  Col. 17, Ln. 1 – 10, where in a major phase [clock cycle] the clockwise bus perform the following shift 0 [3rd processing unit] to 1 [4th processing unit] in the group [first arrangement]);
systolically pulsing data from the fifth processing unit of the second arrangement to the sixth processing unit of the second arrangement (See at least Larson Fig. 3A &  Col. 17, Ln. 1 – 10, where in a major phase [clock cycle] the clockwise bus perform the following shift 62 [5th processing unit] to 63 [6th processing unit]);
systolically pulsing data from the seventh processing unit of the second arrangement to an eighth processing unit of the second arrangement (See at least Larson Fig. 3A &  Col. 17, Ln. 1 – 10, where in a major phase [clock cycle] the clockwise bus perform the following shift 32 [7th processing unit] to 33 [8th processing unit] in the group [second arrangement]);
and systolically pulsing data from the second processing unit of the first arrangement to the seventh processing unit of the second arrangement (See at least Larson Fig. 3A &  Col. 17, Ln. 1 – 10, where in a major phase [clock cycle] the clockwise bus perform the following shift 31 [2nd processing unit] to 32 [7th processing unit]); and
wherein the second processing unit is configured to systolically pulse data to the seventh processing unit along a first interconnect between the first and second arrangements, and wherein the third processing unit is configured to systolically pulse data to the sixth processing unit along a second interconnect between the first and second arrangements (See at least Larson Fig. 3A. where in the systolic clock cycle, data between 2nd processing unit and 7th processing unit is pulse through the connection in between [first interconnect]; data between 3rd processing unit and 6th processing unit is pulse through the connection in between [second interconnect]).
wherein the first and second interconnects form a first pair of interconnects (See at least Larson Fig. 2, where the connection [first interconnect] between 31 and 32 and the connection [second interconnect] between 0 and 63 are pair of connection between two node groups), 
Larson does not explicitly discloses: 
wherein multiple pairs of interconnects connect the first arrangement of sequentially connected processing units to the second arrangement of sequentially connected processing units 
Ge explicitly discloses: 
wherein multiple pairs of interconnects connect the first arrangement of continuously connected processing units to the second arrangement of continuously connected processing units (See at least Ge, Fig. 3A, where PE5 – PE12 form the first arrangement, PE13 – PE16 and PE1 to PE4 form the second arrangement, with dual ring, dual connection between PE4 [sixth processing unit] and PE5 [third processing unit] and PE12 [second processing unit] and PE13 [seventh processing unit] provide two pair of interconnects [multiple pairs of interconnects] between the first arrangement and second arrangement).
Larson and Ge both teach systolic ring bus among multiple processing units are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Larson’s teaching of ring bus architecture with Ge’s teaching of bi-directional dual ring bus to achieve the claimed teaching. One of the ordinary skilled in the art would have motivated to make this modification in order to reduce latency of processing (See at least Ge, Col. 9, ln. 19 – 20).

Regarding Claim 15, depending on Claim 14, Larson in view of Ge teach the method of Claim 14. Larson in view of Ge further teach:
during the first systolic clock cycle, performing a second set of systolic pulses including:
systolically pulsing data from the second processing unit of the first arrangement to the first processing unit of the first arrangement (See at least Larson, Fig. 3A. where within a counter-clock ring taught by Ge, the system systolically pulsing data from 31 [2nd processing unit] to 30 [1st processing unit]);
systolically pulsing data from the third processing unit of the first arrangement to the sixth processing unit of the second arrangement (See at least Larson, Fig. 3A. where within a counter-clock ring taught by Ge, the system systolically pulsing data from 0 [3rd processing unit] to 63 [6th processing unit])
systolically pulsing data from the fourth processing unit of the first arrangement to the third processing unit of the first arrangement (See at least Larson, Fig. 3A. where within a counter-clock ring taught by Ge, the system systolically pulsing data from 1 [4th  processing unit] to 0 [3rd processing unit]);
systolically pulsing data from the sixth processing unit of the second arrangement to the fifth processing unit of the second arrangement (See at least Larson, Fig. 3A. where within a counter-clock ring taught by Ge, the system systolically pulsing data from 63 [6th processing unit] to 62 [5th processing unit]);
and systolically pulsing data from the eighth processing unit of the second arrangement to the seventh processing unit of the second arrangement (See at least Larson, Fig. 3A. where within a counter-clock ring taught by Ge, the system systolically pulsing data from 33 [8th processing unit] to 32 [7th processing unit]).

Regarding Claim 16, depending on Claim 15, Larson in view of Ge teach the method of Claim 15. Larson in view of Ge further teach:
wherein the first set of systolic pulses travel in a first direction through the first and second arrangements, and wherein the second set of systolic pulses travel in a second direction through the first and second arrangements, wherein the first direction is opposite to the second direction (See at least Ge Fig. 3A, where the first set of pulses travel through clockwise ring and the second set of pulses travel through counter-clockwise ring through the first and second arrangement).

Regarding Claim 17, depending on Claim 14, Larson in view of Ge teach the method of Claim 14. Larson in view of Ge further teach:
during a second systolic clock cycle, performing a second set of systolic pulses including:
systolically pulsing, from the second processing unit of the first arrangement to the seventh processing unit of the second arrangement, the data received from the first processing unit during the first systolic clock cycle (See at least Larson Fig. 3A, where in shift operation in the clockwise ring taught by Ge, data from 30 [1st processing unit] is passed to 31 [2nd processing unit] in one clock cycle and subsequently pass to 32 [7th processing unit] in the next clock cycle);
and systolically pulsing, from the third processing unit of the first arrangement to the sixth processing unit of the second arrangement, data received from the fourth processing unit during the first systolic clock cycle (See at least Larson Fig. 3A, where in shift operation in the counter-clockwise ring taught by Ge, data from 1 [4th processing unit] is passed to 0 [3rd processing unit] in one clock cycle and subsequently pass to 63 [6th processing unit] in the next clock cycle).

Claim 3, is rejected under 35 U.S.C. 103 as being unpatentable over Larson US5,659,781 Bidirectional Systolic Ring Network, 1997, in view of Ge et al., US8824603, Bi-Directional Ring-Bus Architecture for Cordic Based Matrix Inversion, 2014, further in view of Ross et al. US9,710,748 Neural Network Processor, 2017.  

Regarding Claim 3, depending on Claim 1, Larson in view of Ge teach the device of Claim 1. Larson in view of Ge further teach:
equal to a number of pairs of interconnects in the multiple pairs of interconnects (See at least, Larson, Col. 2, Ln. 3, where multiple ring can be cascaded to provide still greater amount of parallelism. In the example embodiment each of the ring include one pair of interconnects. Larson did not limit the number of interconnects in the disclosure, instead, an optimum workable number can be decided by an ordinary skill in the art)
Larson in view of Ge did not explicitly teach:
wherein each of the first and second processing units includes a number of convolution engines 
Ross explicitly teach
wherein each of the first and second processing units includes a number of convolution engines (See at least Ross, Col. 9, Ln. 1, where convolutional neural network) 
Larson (in view of Ge) and Ross both teach systolic multi processor device and method and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Larson (in view of Ge)’s teaching of multi processor connected with dual ring bus with Ross’s teaching of multi processor convolutional neural network system to achieve the claimed teaching. One of the ordinary skilled in the art would have motivated to make this as the combination yield predictable results. 

Claim 6, 12, 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Larson US5,659,781 Bidirectional Systolic Ring Network, 1997, in view of Ge et al., US8824603, Bi-Directional Ring-Bus Architecture for Cordic Based Matrix Inversion, 2014 further in view of Baji et al. US5,091,864 Systolic Processor Elements for a Neural Network, 1992.  

Regarding Claim 6 depending on Claim 1, Larson in view of Ge teach the device of Claim 1. Larson in view of Ge did not explicitly teach: 
wherein the device further includes a systolic processor chip, and wherein the first and second arrangements of first and second processing units comprise circuitry embedded in the systolic processor chip
Baji explicitly teach:
wherein the device further includes a systolic processor chip, and wherein the first and second arrangements of first and second processing units comprise circuitry embedded in the systolic processor chip (see at least Baji, Col. 3, Ln. 66 – 67, where some hundreds of processors could be integrated on a chip).
Larson in view of Ge and Baji both teach systolic multi processor device and method and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Larson (in view of Ge)’s teaching of multi processor connected with dual ring bus with Baji’s teaching of multi processor neural network system to achieve the claimed teaching. One of the ordinary skilled in the art would have motivated to make this as the combination yield predictable results. 

Regarding Claim 12, depending on Claim 1 Larson in view of Ge teach the device of Claim 1. Larson in view of Ge do not explicitly disclose: 
wherein at least a subset of the first arrangement of sequentially connected processing units are assigned to perform computations of a first layer of the neural network, and wherein at least a subset of the second arrangement of sequentially connected processing units are assigned to perform computations of a second layer of the neural network
Baji explicitly discloses: 
wherein at least a subset of the first arrangement of sequentially connected processing units are assigned to perform computations of a first layer of the neural network, and wherein at least a subset of the second arrangement of sequentially connected processing units are assigned to perform computations of a second layer of the neural network (See at least Baji, Fig. 13, where first and second layer neural net with its processing units).
The reason of combination of Larson’s teaching and Ge’s and Baji’s teaching is the same as Claim 6.

Regarding Claim 18, depending on 17, Larson in view of Ge teach the method of Claim 17. Larson in view of Ge further disclose: 
via the seventh processing unit during the second systolic clock cycle, processing the data received from the second processing unit during the first systolic clock cycle (See at least Ge, Fig. 3D and 3A, where data in sequence are processed through nodes in a systolic manner. Each node perform corresponding calculation of the data passed on each systolic cycle), 
Larson in view of Ge do to explicitly disclose: 
the processing performed according to computations of a node of the second layer of the neural network.
Baji explicitly discloses: 
the processing performed according to computations of a node of the second layer of the neural network (See at least Baji, Fig. 11, where the processing unit perform computation in accordance of the corresponding layer of the neural network).
The reason of combination of Larson’s teaching and Ge’s and Baji’s teaching is the same as Claim 6.

Regarding Claim 19, depending on 18, Larson in view of Ge and Baji teach the method of Claim 18. Larson in view of Ge and Baji further teach:
via the seventh processing unit during a third systolic clock cycle, processing the data received from the second processing unit during the second systolic clock cycle (See at least Ge, Fig. 3D and 3A, where data in sequence are processed through nodes in a systolic manner. Each node perform corresponding calculation of the data passed on each systolic cycle), the processing performed according to computations of the node of the second layer of the neural network (See at least Baji, Fig. 11, where the processing unit perform computation in accordance of the corresponding layer of the neural network).
The reason of combination of Larson’s teaching and Ge’s and Baji’s teaching is the same as Claim 6.

Claim 9 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Larson US5,659,781 Bidirectional Systolic Ring Network, 1997, in view of Ge et al., US8824603, Bi-Directional Ring-Bus Architecture for Cordic Based Matrix Inversion, 2014, and further in view of Sengupta US20180189648 Event Driven and Time Hopping Neural Network.  

Regarding Claim 9, depending on Claim 8, Larson in view of Ge teach the device of Claim 8, Larson in view of Ge further teach:
wherein the seventh processing unit is configured to: receive the activation output and perform processing to generate an additional activation output (See at least Ge. Fig. 3C, where PE 355 [7th processing unit] receive output on clockwise dock 362 from the previous processor on the bus [2nd processing unit], process data and generate output); 
Larson in view of Ge did not explicitly teach:
use the identifier to identify a weight to use for processing the activation output.
Sengupta explicitly teach:
use the identifier to identify a weight to use for processing the activation output (See at least Sengupta, para. 0089, where weight array indexed by address of the neural unit; para. 0092, ln. 1 – 4, weight output from array by the neural unit identifier).
Larson (in view of Ge) and Sengupta both teach multi processor neural network device and method and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Larson (in view of Ge )’s teaching of multi processor neural network with Sengupta’s approach of weight storage/retrieval  to achieve the claimed teaching. One of the ordinary skilled in the art would have motivated to make this as the combination yield predictable results. 

Regarding Claim 11, depending on Claim 9, Larson in view of Ge and Sengupta teach the device of Claim 9. Larson in view of Ge and Sengupta further teach: 
wherein the weight is retrieved from a memory external to the seventh processing unit (See at least Sengupta, para. 0051, ln. 10 – 14, where weight array can be stored in DRAM or flash memory which is external to the processing unit).

Claim 10 is  rejected under 35 U.S.C. 103 as being unpatentable over Larson US5,659,781 Bidirectional Systolic Ring Network, 1997, in view of Ge et al., US8824603, Bi-Directional Ring-Bus Architecture for Cordic Based Matrix Inversion, 2014, Sengupta US20180189648 Event Driven and Time Hopping Neural Network and further in view of Baji et al. US5,091,864 Systolic Processor Elements for a Neural Network, 1992.  

Regarding Claim 10 depending on Claim 9, Larson in view of Ge and Sengupta teach the device of Claim 9. Larson in view of Ge and Sengupta do not explicitly disclose: 
Ge explicitly discloses: 
wherein the weight is stored locally at the seventh processing unit (See at least Baji, Fig. 2, & Col. 8, ln. 5 – 10, where weight mij stored locally in element 1).
Larson (in view of Ge and Sengupta) and Baji both teach systolic multi processor device and method and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Larson (in view of Ge and Sengupta)’s teaching of multi processor connected with dual ring bus with Baji’s teaching of multi processor neural network system to achieve the claimed teaching. One of the ordinary skilled in the art would have motivated to make this as the combination yield predictable results. 

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Larson US5,659,781 Bidirectional Systolic Ring Network, 1997, in view of Ge et al., US8824603, Bi-Directional Ring-Bus Architecture for Cordic Based Matrix Inversion, 2014, Baji et al. US5,091,864 Systolic Processor Elements for a Neural Network, 1992 and further in view of Sengupta US20180189648 Event Driven and Time Hopping Neural Network.  

Regarding Claim 20, depending on Claim 18, Larson in view of Ge and Baji teach the method of Claim 18. Larson in view of Ge, Baji further teach:
for processing the data received from the second processing unit (See at least Ge. Fig. 3C, where PE 355 [7th processing unit] receive output on clockwise dock 362 from the previous processor on the bus [2nd processing unit], process data and generate output), the tag identifying that the data originated at the second processing unit (See at least Ge. Col. 7, ln. 32, where PE address [identifier indicates an address for the unit]).
Larson in view of Ge and Baji do not explicitly disclose:
using a tag of the data received from the second processing unit to identify a weight to use for processing the data
Sengupta explicitly discloses: 
using a tag of the data received from the second processing unit to identify a weight to use for processing the data (See at least Sengupta, para. 0089, where weight array indexed by address of the neural unit; para. 0092, ln. 1 – 4, weight output from array by the neural unit identifier)
Larson (in view of Ge and Baji) and Sengupta both teach multi processor neural network device and method and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Larson (in view of Ge and Baji)’s teaching of multi processor neural network with Sengupta’s approach of weight storage/retrieval  to achieve the claimed teaching. One of the ordinary skilled in the art would have motivated to make this as the combination yield predictable results. 

Response to Amendment
Applicant's remark filed on 11/11/2021 has been fully considered but they are not persuasive. 
Applicant’s arguments with respect to art rejection have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIEN MING CHOU whose telephone number is (571)272-9354.  The examiner can normally be reached on Monday- Friday 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, CHAKI KAKALI can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/S.C./Examiner, Art Unit 2122                                                                                                                                                                                                        
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122