DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 12/27/2018, 4/25/2019, 4/14/2020, 4/15/2020, 8/6/2020, 9/5/2020 and 2/21/2021 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they do not include the following reference sign(s) mentioned in the description: Block 610.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they do not include the following reference sign(s) mentioned in the description: core 1800.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.



Claim 1, 3, 5 and 12 recite the limitation "the first processing units".  There is insufficient antecedent basis for this limitation in the claim or it’s depending claim. “first processing units” is defined twice in Claim 1. It is not clear which feature these limitation is referring to. 

Claim 3, 5 and 12 recite the limitation "the second processing units".  There is insufficient antecedent basis for this limitation in the claim or it’s depending claim. “second processing units” is defined twice in Claim 1. It is not clear which feature these limitation is referring to. 


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claim 1 – 2, 4 and 14 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Larson US5,659,781 Bidirectional Systolic Ring Network, 1997. 


A device for systolically processing data (See Larson Col. 10, Ln. 57, where systolic ring network) according to a neural network, the device comprising:
a first arrangement of first processing units including at least first, second, third, and fourth processing units, wherein the first and second processing units are connected to systolically pulse data to one another, and wherein the third and fourth processing units are connected to systolically pulse data to one another (See at least Larson, Col 14, Ln. 15 – 25 where nodes are grouped; Fig. 3A, where node 0 – 31 form a group [first arrangement], 30 [1st processing unit] and 31 [2nd processing unit] are connected, 0 [3rd processing unit] and 1 [4th processing unit] are connected; Col 17, Ln. 1, where in linear shift process, data are pulsed to its adjacent node)
a second arrangement of second processing units including at least fifth, sixth, seventh, and eighth processing units, wherein the fifth and sixth processing units are connected to systolically pulse data to one another, and wherein the seventh and eighth processing units are connected to systolically pulse data to one another (See at least Larson, Fig. 3A, where node 32 – 63 form a group [second arrangement], 62 5th  processing unit] and 63 [6th processing unit] are connected, 32 [7th processing unit] and 33 [8th processing unit] are connected; Col 17, Ln. 1, where in linear shift process, data are pulsed to its adjacent node);
a first interconnect between the second and seventh processing units, wherein the second processing unit is configured to systolically pulse data to the seventh processing unit along the first interconnect (see at least Larson Fig. 3A, where nd processing unit] and 32[7th processing unit])
and a second interconnect between the third and sixth processing units, wherein the third processing unit is configured to systolically pulse data to the sixth processing unit along the second interconnect (see at least Larson Fig. 3A, where connection [second interconnect] between 0[3rd processing unit] and 63[6th processing unit])

Regarding Claim 2, depending on Claim 1, Larson further teaches:
wherein the first and second interconnects form a first pair of interconnects (See at least Larson Fig. 3A, where the connection [first interconnect] between 31 and 32 and the connection [second interconnect] between 0 and 63 are pair of connection between two node groups), wherein a number of pairs of interconnects connects the first arrangement of first processing units to the second arrangement of second processing units (See at least Larson Col 2, Ln. 3, where multiple rings [multiple pair of interconnects] can be cascaded to provide still greater amount of parallelism).

Regarding Claim 4, depending on Claim 1, Larson further teaches:
further comprising a second pair of interconnects, the second pair of interconnects including a third interconnect between an uppermost processing unit in the first arrangement and an uppermost processing unit in the second arrangement and a fourth interconnect between a lowermost processing unit in the first arrangement and a lowermost processing unit in the second arrangement. (See at least Larson Fig. 3A, where the second pair of interconnects 

Regarding Claim 14, Larson teaches: 
A method for systolically processing data (See at least Larson, Col. 1, Ln. 30 – 34, where systolic data transfers) according to a neural network comprising at least a first layer and a second layer, the method comprising: 
during a first systolic clock cycle, performing a first set of systolic pulses of data through at least first, second, third, and fourth processing units arranged along a first arrangement (See at least Larson Fig. 3A &  Col. 17, Ln. 1 – 10, where in a major phase [clock cycle] the clockwise bus perform the following shift 30 [1st processing unit] to 31 [2nd processing unit] and 0 [3rd processing unit] to 1 [4th processing unit] in the group [first arrangement]) and at least fifth, sixth, seventh, and eighth processing units arranged along a second arrangement (See at least Larson Fig. 3A &  Col. 17, Ln. 1 – 10, where in a major phase [clock cycle] the clockwise bus perform the following shift 62 [5th processing unit] to 63 [6th processing unit] and 32 [7th processing unit] to 33 [8th
systolically pulsing data from the first processing unit of the first arrangement to the second processing unit of the first arrangement (See at least Larson Fig. 3A &  Col. 17, Ln. 1 – 10, where in a major phase [clock cycle] the clockwise bus perform the following shift 30 [1st processing unit] to 31 [2nd processing unit]);
systolically pulsing data from the third processing unit of the first arrangement to the fourth processing unit of the first arrangement (See at least Larson Fig. 3A &  Col. 17, Ln. 1 – 10, where in a major phase [clock cycle] the clockwise bus perform the following shift 0 [3rd processing unit] to 1 [4th processing unit] in the group [first arrangement]);
systolically pulsing data from the fifth processing unit of the second arrangement to the sixth processing unit of the second arrangement (See at least Larson Fig. 3A &  Col. 17, Ln. 1 – 10, where in a major phase [clock cycle] the clockwise bus perform the following shift 62 [5th processing unit] to 63 [6th processing unit]);
systolically pulsing data from the seventh processing unit of the second arrangement to an eighth processing unit of the second arrangement (See at least Larson Fig. 3A &  Col. 17, Ln. 1 – 10, where in a major phase [clock cycle] the clockwise bus perform the following shift 32 [7th processing unit] to 33 [8th processing unit] in the group [second arrangement]);
and systolically pulsing data from the second processing unit of the first arrangement to the seventh processing unit of the second arrangement (See at least Larson Fig. 3A &  Col. 17, Ln. 1 – 10, where in a major phase [clock cycle] the clockwise bus perform the following shift 31 [2nd processing unit] to 32 [7th
wherein the second processing unit is configured to systolically pulse data to the seventh processing unit along a first interconnect between the first and second arrangements, and wherein the third processing unit is configured to systolically pulse data to the sixth processing unit along a second interconnect between the first and second arrangements (See at least Larson Fig. 3A. where in the systolic clock cycle, data between 2nd processing unit and 7th processing unit is pulse through the connection in between [first interconnect]; data between 3rd processing unit and 6th processing unit is pulse through the connection in between [second interconnect]).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:

2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim 3, is rejected under 35 U.S.C. 103 as being unpatentable over Larson US5,659,781 Bidirectional Systolic Ring Network, 1997, in view of Ge et al., US8824603, Bi-Directional Ring-Bus Architecture for Cordic Based Matrix Inversion, 2014 further in view of Ross et al. US9,710,748 Neural Network Processor, 2017.  
Regarding Claim 3, depending on Claim 1, Larson in view of Ge teach the device of Claim 1. Larson in view of Ge further teach:
equal to the number of pairs of interconnects (See at least, Larson, Col. 2, Ln. 3, where multiple ring can be cascaded to provide still greater amount of parallelism. In the example embodiment each of the ring include one pair of interconnects. Larson did not limit the number of interconnects in the disclosure, instead, an optimum workable number can be decided by an ordinary skill in the art)
Larson in view of Ge did not explicitly teach:
wherein each of the first and second processing units includes a number of convolution engines 
Ross explicitly teach
wherein each of the first and second processing units includes a number of convolution engines (See at least Ross, Col. 9, Ln. 1, where convolutional neural network) 
Larson (in view of Ge) and Ross both teach systolic multi processor device and method and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Larson (in view of Ge)’s teaching of multi processor connected with dual ring bus with Ross’s teaching of multi processor convolutional neural network system to achieve the claimed teaching. One of the ordinary skilled in the art would have motivated to make this as the combination yield predictable results. 

Claim 5, 7, 8, 13, 15 - 17 are rejected under 35 U.S.C. 103 as being unpatentable over Larson US5,659,781 Bidirectional Systolic Ring Network, 1997, in view of Ge et al., US8824603, Bi-Directional Ring-Bus Architecture for Cordic Based Matrix Inversion, 2014. 

Regarding Claim 5, Larson teach a device of Claim 4, Larson did not explicitly teach:
wherein, at each systolic pulse, each of the first and second processing units is configured to systolically pulse two pieces of data, each to a different one of the first and second processing units.
Ge et al. explicitly teach:
wherein, at each systolic pulse, each of the first and second processing units is configured to systolically pulse two pieces of data, each to a different one of the first and second processing units (See at least Ge, Fig. 3A, where clockwise ring and counter-clockwise ring are linked to each of the node. On each pulse, two pieces of data are flowing between first processing units and second processing units).
Larson and Ge both teach systolic ring bus among multiple processing units are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the 
 
Regarding Claim 7, depending on Claim 1, Larson in view of Ge teach the device of Claim 1. Larson in view of Ge further teach: 
wherein the second processing unit includes an output systolic element configured to tag an activation output generated by the second processing unit with an identifier, wherein the identifier indicates an address for the second processing unit (See at least Ge. Fig. 3C & Col. 7, ln. 32, where dock 362 [output systolic element] is responsible to initiate token with PE address [identifier indicates an address for the unit]).

Regarding Claim 8, depending on Claim 7, Larson in view of Ge teach the device of Claim 7. Larson in view of Ge further teach: 
wherein the activation output including the tag is systolically pulsed to an input systolic element of the seventh processing unit (See at least Ge. Fig. 3C, where the output token [activation output including tag] pulse on the clockwise ring to the dock 362 [input systolic element] of the next clockwise processing unit [7th processing unit]).

Regarding Claim 13, depending on Claim 1, Larson in view of Ge teach the device of Claim 1, Larson in view of Ge further teach: 


Regarding Claim 15, depending on Claim 14, Larson in view of Ge teach the method of Claim 14. Larson in view of Ge further teach:
during the first systolic clock cycle, performing a second set of systolic pulses including:
systolically pulsing data from the second processing unit of the first arrangement to the first processing unit of the first arrangement (See at least Larson, Fig. 3A. where within a counter-clock ring taught by Ge, the system systolically pulsing data from 31 [2nd processing unit] to 30 [1st processing unit]);
systolically pulsing data from the third processing unit of the first arrangement to the sixth processing unit of the second arrangement (See at least Larson, Fig. 3A. where within a counter-clock ring taught by Ge, the system systolically pulsing data from 0 [3rd processing unit] to 63 [6th
systolically pulsing data from the fourth processing unit of the first arrangement to the third processing unit of the first arrangement (See at least Larson, Fig. 3A. where within a counter-clock ring taught by Ge, the system systolically pulsing data from 1 [4th  processing unit] to 0 [3rd processing unit]);
systolically pulsing data from the sixth processing unit of the second arrangement to the fifth processing unit of the second arrangement (See at least Larson, Fig. 3A. where within a counter-clock ring taught by Ge, the system systolically pulsing data from 63 [6th processing unit] to 62 [5th processing unit]);
and systolically pulsing data from the eighth processing unit of the second arrangement to the seventh processing unit of the second arrangement (See at least Larson, Fig. 3A. where within a counter-clock ring taught by Ge, the system systolically pulsing data from 33 [8th processing unit] to 32 [7th processing unit]).

Regarding Claim 16, depending on Claim 15, Larson in view of Ge teach the method of Claim 15. Larson in view of Ge further teach:
wherein the first set of systolic pulses travel in a first direction through the first and second arrangements, and wherein the second set of systolic pulses travel in a second direction through the first and second arrangements, wherein the first direction is opposite to the second direction (See at least Larson Fig. 3A & Ge Fig. 3A, where the first set of pulses travel through clockwise ring and the second set of pulses travel through counter-clockwise ring through the first and second arrangement).


during a second systolic clock cycle, performing a second set of systolic pulses including:
systolically pulsing, from the second processing unit of the first arrangement to the seventh processing unit of the second arrangement, the data received from the first processing unit during the first systolic clock cycle (See at least Larson Fig. 3A, where in shift operation in the clockwise ring, data from 30 [1st processing unit] is passed to 31 [2nd processing unit] in one clock cycle and subsequently pass to 32 [7th processing unit] in the next clock cycle);
and systolically pulsing, from the third processing unit of the first arrangement to the sixth processing unit of the second arrangement, the data received from the fourth processing unit during the first systolic clock cycle (See at least Larson Fig. 3A, where in shift operation in the counter-clockwise ring, data from 1 [4th processing unit] is passed to 0 [3rd processing unit] in one clock cycle and subsequently pass to 63 [6th processing unit] in the next clock cycle).

Claim 6, 12, 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Larson US5,659,781 Bidirectional Systolic Ring Network, 1997, in view of Ge et al., US8824603, Bi-Directional Ring-Bus Architecture for Cordic Based Matrix Inversion, 2014 further in view of Baji et al. US5,091,864 Systolic Processor Elements for a Neural Network, 1992.  


wherein the device further includes a systolic processor chip, and wherein the first and second arrangements of first and second processing units comprise circuitry embedded in the systolic processor chip
Baji explicitly teach:
wherein the device further includes a systolic processor chip, and wherein the first and second arrangements of first and second processing units comprise circuitry embedded in the systolic processor chip (see at least Baji, Col. 3, Ln. 66 – 67, where some hundreds of processors could be integrated on a chip).
Larson (in view of Ge) and Baji both teach systolic multi processor device and method and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Larson (in view of Ge)’s teaching of multi processor connected with dual ring bus with Baji’s teaching of multi processor neural network system to achieve the claimed teaching. One of the ordinary skilled in the art would have motivated to make this as the combination yield predictable results. 

Regarding Claim 12, depending on Claim 1 Larson in view of Ge and Baji teach the device of Claim 1. Larson in view of Ge and Baji further teaches: 
wherein at least a subset of the first processing units are assigned to perform computations of a first layer of the neural network, and wherein at least a subset of the second 

Regarding Claim 18, depending on 17, Larson in view of Ge and Baji teach the method of Claim 17. Larson in view of Ge and Baji further teach:
via the seventh processing unit during the second systolic clock cycle, processing the data received from the second processing unit during the first systolic clock cycle (See at least Ge, Fig. 3D and 3A, where data in sequence are processed through nodes in a systolic manner. Each node perform corresponding calculation of the data passed on each systolic cycle), the processing performed according to computations of a node of the second layer of the neural network (See at least Baji, Fig. 11, where the processing unit perform computation in accordance of the corresponding layer of the neural network).

Regarding Claim 19, depending on 18, Larson in view of Ge and Baji teach the method of Claim 18. Larson in view of Ge and Baji further teach:
via the seventh processing unit during a third systolic clock cycle, processing the data received from the second processing unit during the second systolic clock cycle (See at least Ge, Fig. 3D and 3A, where data in sequence are processed through nodes in a systolic manner. Each node perform corresponding calculation of the data passed on each systolic cycle), the processing performed according to computations of the node of the second layer of the neural network (See at least Baji, Fig. 11, where the processing unit perform computation in accordance of the corresponding layer of the neural network).

Claim 9 – 11 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Larson US5,659,781 Bidirectional Systolic Ring Network, 1997, in view of Ge et al., US8824603, Bi-Directional Ring-Bus Architecture for Cordic Based Matrix Inversion, 2014, Baji et al. US5,091,864 Systolic Processor Elements for a Neural Network, 1992 and further in view of Sengupta US20180189648 Event Driven and Time Hopping Neural Network.  

Regarding Claim 9, depending on Claim 8, Larson in view of Ge teach the device of Claim 8, Larson in view of Ge further teach:
wherein the seventh processing unit is configured to: receive the activation output and perform processing to generate an additional activation output (See at least Ge. Fig. 3C, where PE 355 [7th processing unit] receive output on clockwise dock 362 from the previous processor on the bus [2nd processing unit], process data and generate output), 
Larson in view of Ge did not explicitly teach:
use the identifier to identify a weight to use for processing the activation output.
Sengupta explicitly teach:
use the identifier to identify a weight to use for processing the activation output (See at least Sengupta, para. 0089, where weight array indexed by address of the neural unit; para. 0092, ln. 1 – 4, weight output from array by the neural unit identifier).
Larson (in view of Ge and Baji) and Sengupta both teach multi processor neural network device and method and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining 

Regarding Claim 10 depending on Claim 9, Larson in view of Ge and Baji and Sengupta teach the device of Claim 9. Larson in view of Ge, Baji and Sengupta further teach :
wherein the weight is stored locally at the seventh processing unit (See at least Baji, Fig. 2, & Col. 8, ln. 5 – 10, where weight mij stored locally in element 1).

Regarding Claim 11, depending on Claim 9, Larson in view of Ge, Baji and Sengupta teach the device of Claim 9. Larson in view of Ge, Baji and Sengupta further teach: 
wherein the weight is retrieved from a memory external to the seventh processing unit (See at least Sengupta, para. 0051, ln. 10 – 14, where weight array can be stored in DRAM or flash memory which is external to the processing unit).

Regarding Claim 20, depending on Claim 18, Larson in view of Ge, Baji and Sengupta teach the method of Claim 18. Larson in view of Ge, Baji and Sengupta further teach:
using a tag of the data received from the second processing unit to identify a weight (See at least Sengupta, para. 0089, where weight array indexed by address of the neural unit; para. 0092, ln. 1 – 4, weight output from array by the neural unit identifier) to use for processing the data received from the second processing unit (See at least Ge. Fig. 3C, where PE th processing unit] receive output on clockwise dock 362 from the previous processor on the bus [2nd processing unit], process data and generate output), the tag identifying that the data originated at the second processing unit (See at least Ge. Col. 7, ln. 32, where PE address [identifier indicates an address for the unit]).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure: Jackson, US20150112911 Coupling Parallel Event Driven Computation with Serial Computation. Jackson teach multi processor neural network with weight tagging mechanism. Ginosar, US5812993, Digital Hardware Architecture for Realizing Neural Network. Ginosar teach multi layer neural network pipeline. Chiueh US5799134 One dimensional Systolic Array Architecture for Neural Network. Chiueh teach one dimensional systolic architecture used on neural network application. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIEN MING CHOU whose telephone number is (571)272-9354.  The examiner can normally be reached on Monday- Friday 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, CHAKI KAKALI can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/S.C./Examiner, Art Unit 2122                                                                                                                                                                                                        


/ERIC NILSSON/Primary Examiner, Art Unit 2122