DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application, filed on 04/19/2018, claims priority to Provisional Application No. 62/639,451 (filed on 03/06/2018).
This action is in response to preliminary amendments filed on 06/19/2019. In the current amendments, claims 1 and 6 are amended. Claims 1-10 are pending and have been examined.

Information Disclosure Statement
The information disclosure statement (IDS) was submitted on 04/19/2018 and 06/19/2019.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Interpretation
Claims 1-10 recite “for fast weighted sum calculation in neural networks” in the preamble, which describes the purpose of the claimed invention. This recitation merely states “the purpose or intended use of the invention, rather than any distinct definition of any of the claimed invention’s limitations” because it describes. See MPEP 2111.02 (II). Therefore, the preamble is not construed as a limitation that has patentable weight. See MPEP 2111.02 (II) (“If the body of a claim fully and intrinsically sets forth all of the limitations of the claimed invention, and the preamble merely states, for example, the purpose or intended use of the invention, rather than any distinct definition of any of the claimed invention’s limitations, then the preamble is not considered a limitation and is of no significance to claim 

Claim Objections
Claims 1-5 are objected to because of the following informalities: 
Claim 1 recites “one weight_ to”, which should be amended as “one weight to” (delete extra underscore). Claims 2-5 are objected to based on the same rationale.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-10 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Each of claims 1 and 6 recites “N output, wherein M and N are integer greater than 1” in line 2. This recitation requires “N output” (singular) but then notes that N is greater than 1, therefore the recitation lacks clarity as to whether “N output” should be “N outputs.” For examination purposes, “N output” has been interpreted as “N outputs”.

Each dependent claims is rejected based on the same rationale as the claim from which it depends.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 2, and 5 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the claims could be considered software per se.
Claim 1 is directed to “A computing device for fast weighted sum calculation in neural networks” (emphasis added). The Specification does not provide a definition for the term “device.” Therefore, absent a special definition that the “device” refers to hardware only, the broadest reasonable interpretation of “device” includes a software device. Moreover, the claim does not contain other structural recitations that can be construed as hardware only (for example, the recitation of “processing element” can be considered software, and “multipliers” and “adders” can be considered software mathematical operators performing mathematical operations). As such, the claimed invention in claim 1 could be considered software per se. Claims 2 and 5 do not recite any structural elements that can be construed as hardware only, therefore the claimed invention in claims 2 and 5 could be considered software per se.

Claims 1-10 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding Claim 1,
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a computing device for fast weighted sum calculation in neural networks, which can be directed to a machine or a manufacture, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: Each of the following limitations: 
having M inputs and N output, wherein M and N are integer greater than 1
calculating a weighted sum for one target output, wherein...generate all N weighted sums in one multiplication...plus a plurality of addition...
M multipliers coupled to M inputs and M weights respectively, wherein the M weights are associated with the M inputs and said one target output, and wherein each of the M multiplier performs multiplication of one input with one weight_ to generate one weighted input, and the M multipliers generate M weighted inputs; and
a plurality of adders arranged to add the M weighted inputs to generate said one target output.
as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the generic computer components language (“the computing device comprising: N processing elements with each processing element designated for”, “the N processing elements”, “in one...clock cycle plus a plurality of...clock cycles”, “each processing element comprises”). The above limitations in the context of this claim encompass the recited abstract idea of a mathematical concept of “calculating a weighted sum for one target output…[and] generate all N weighted sums in one multiplication [calculation/operation] plus 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element(s) of “the computing device comprising: N processing elements with each processing element designated for”, “the N processing elements”, and “each processing element comprises”, as drafted, is/are reciting generic computer component(s). The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Further, the additional elements of “clock cycle[s]” as drafted, are reciting generic computer component(s) because generic computer processors implement timing/clock cycles for operations. Applying calculations onto a generic processor using generic clock cycles amounts to mere instructions to apply the calculation (abstract idea) on a computer. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a 
Regarding Claim 2,
Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 2 is directed to a computing device for fast weighted sum calculation in neural networks, which can be directed to a machine or a manufacture, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: Each of the following limitations: 
wherein M corresponds to a power-of-2 integer and the plurality of adders corresponds to (M-1) adders arranged in a binary-tree fashion to add the M weighted inputs to generate said one target output.
as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the generic computer components language (“the computing device comprising: N processing elements with each processing element designated for”, “the N processing elements”, “in one...clock cycle plus a plurality of...clock cycles”, “each processing element comprises”). The above limitations in the context of this claim encompass the recited abstract idea of a mathematical concept of “calculating a weighted sum for one target output…[and] generate all N weighted sums in one multiplication [calculation/operation] plus a plurality of addition [calculations/operations].” In particular, the calculation of the weighted sum involves using M multipliers (mathematical operators) to perform multiplication (mathematical calculation) where in M is a power-of-2 integer and the adders (mathematical operators) are arranged in a binary-tree fashion to perform addition (mathematical calculation) to generate target output.  
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element(s) of “the computing device comprising: N processing elements with each processing element designated for”, “the N processing elements”, and “each processing element comprises”, as drafted, is/are reciting generic computer component(s). The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Further, the additional elements of “clock cycle[s]” as drafted, are reciting generic computer component(s) because generic computer processors implement timing/clock cycles for operations. Applying calculations onto a generic processor using generic clock cycles amounts to mere instructions to apply the calculation (abstract idea) on a computer. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 3,
Claim 3 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 3 is directed to a computing device for fast weighted sum calculation in neural networks, which can be directed to a machine or a manufacture, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: Each of the following limitations in claim 1, as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the generic computer components language (“the computing device comprising: N processing elements with each processing element designated for”, “the N processing elements”, “in one...clock cycle plus a plurality of...clock cycles”, “each processing element comprises”, “each processing element further comprises”) and extra-solution activity language (“timing and control circuitry to coordinate systolic operations for the M multipliers and the plurality of adders”). The above limitations in claim 1 encompass the recited abstract idea of a mathematical concept of “calculating a weighted sum for one target output…[and] generate all N weighted sums in one multiplication [calculation/operation] plus a plurality of addition [calculations/operations].” In particular, the calculation of the weighted sum involves using M multipliers (mathematical operators) to perform multiplication (mathematical calculation) to generate weighted input, and using adders (mathematical operators) to perform addition (mathematical calculation) to generate target output.  
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element(s) of “the computing device comprising: N processing elements with each processing element designated for”, “the N processing elements”, “in one...clock cycle plus a plurality of...clock cycles”, “each processing element comprises”, “each processing element further comprises”, as drafted, is/are reciting generic computer component(s). The generic computer 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Finally, the insignificant extra-solution activity of “timing and control circuitry to coordinate systolic operations for the M multipliers and the plurality of adders” is well‐understood, routine, and conventional. MPEP 2106.05(d) notes that “[a] factual determination is required to support a conclusion that an additional element (or combination of Demirsoy et al. (US 8,458,243 B1) in Fig. 2 and Col. 4 lines 6-16: “FIG. 2 shows typical circuitry 800 for implementing (in so-called "direct form") the FIR filter equation shown in FIG. 1. In this FIG., elements 820-1 through 820-(k-1) are a series of delay circuit elements (e.g., registers or flip-flops), each of which delays the input sample applied to it by one operating cycle of the circuitry. (As noted earlier, such an "operating cycle" is typically the time duration of each successive sample x[n] in the input sample stream. This "time duration" is also typically the period (length in time) of one cycle in a clock signal that controls the timing of various events throughout the circuitry” (emphasis added) disclose that it is typical (conventional) to include delay circuit elements (correspond to timing and control circuitry) that coordinate systolic operations of multipliers and adders (see the typical circuitry in Fig. 2). The claim is not patent eligible.
Regarding Claim 4,
Claim 4 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 4 is directed to a computing device for fast weighted sum calculation in neural networks, which can be directed to a machine or a manufacture, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: Each of the following limitations in claim 1, as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the generic computer components language (“the computing device comprising: N processing elements with each processing element designated for”, “the N processing elements”, “in one...clock cycle plus a plurality of...clock cycles”, “each processing element comprises”, “each processing element further comprises”) and extra-solution activity language (“a buffer to store the M weights”). The above limitations in claim 1 encompass the 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element(s) of “the computing device comprising: N processing elements with each processing element designated for”, “the N processing elements”, “in one...clock cycle plus a plurality of...clock cycles”, “each processing element comprises”, “each processing element further comprises”), as drafted, is/are reciting generic computer component(s). The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Further, the additional elements of “clock cycle[s]” as drafted, are reciting generic computer component(s) because generic computer processors implement timing/clock cycles for operations. Applying calculations onto a generic processor using generic clock cycles amounts to mere instructions to apply the calculation (abstract idea) on a computer. Finally, the recitation of “a buffer to store the M weights” amounts to an insignificant extra-solution activity because under broadest reasonable interpretation, this limitation can be construed as referring to a storing data in a buffer. Since the recitation of “a buffer to store the M weights” amounts to an insignificant extra-solution activity, it cannot integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Furthermore, according to MPEP 2106.05(d)(II), “The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity...iv. Storing and retrieving information in memory”. Therefore, the insignificant extra-solution activity of “a buffer to store the M weights” is well‐understood, routine, and conventional and cannot amount to significantly more. The claim is not patent eligible.
Regarding Claim 5,
Claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 5 is directed to a computing device for fast weighted sum calculation in neural networks, which can be directed to a machine or a manufacture, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: Each of the following limitations in claim 1, as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the generic computer components language (“the computing device comprising: N processing elements with each processing element designated for”, “the N processing elements”, “in one...clock cycle plus a plurality of...clock cycles”, “each 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element(s) of “the computing device comprising: N processing elements with each processing element designated for”, “the N processing elements”, and “each processing element comprises”, as drafted, is/are reciting generic computer component(s). The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Further, the additional elements of “clock cycle[s]” as drafted, are reciting generic computer component(s) because generic computer processors implement timing/clock cycles for operations. Applying calculations onto a generic processor using generic clock cycles amounts to mere instructions to apply the calculation (abstract idea) on a computer. Finally, the recitation of “wherein the M weights are provided to each processing element externally” amounts to an insignificant extra-solution activity because under broadest reasonable interpretation, this limitation can be construed as referring to a processing element retrieving input data from memory. Since the recitation of “wherein the M weights are provided to each 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Furthermore, according to MPEP 2106.05(d)(II), “The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity...iv. Storing and retrieving information in memory”. Therefore, the insignificant extra-solution activity of “wherein the M weights are provided to each processing element externally” is well‐understood, routine, and conventional and cannot amount to significantly more. The claim is not patent eligible.
Regarding Claim 6,
Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 6 is directed to a method for fast weighted sum calculation in neural networks, which can be directed to a process, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: Each of the following limitations: 
having M inputs and N output, wherein M and N are integer greater than 1,
calculate weighted sums for the all N outputs in one multiplication...plus a plurality of addition...and
calculating a weighted sum for one target output... calculating a weighted sum for one target output...
multiplying M inputs and M weights respectively using M multipliers in said one processing element to generate M weighted inputs for said one target output, wherein the M weights are associated with the M inputs and said one target output;
adding the M weighted inputs to generate said one target output using a plurality of adders in said one processing element; and providing said one target output.
as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the generic computer components language (“utilizing N processing elements...wherein said utilizing the
N processing elements comprises”, “utilizing one processing element designated for...wherein said utilizing said one processing element designated for”, and “in one...clock cycle plus a plurality of...clock cycles”). The above limitations in the context of this claim encompass the recited abstract idea of a mathematical concept of “calculating a weighted sum for one target output…[and] generate all N weighted sums in one multiplication [calculation/operation] plus a plurality of addition [calculations/operations].” In particular, the calculation of the weighted sum involves using M multipliers (mathematical operators) to perform multiplication (mathematical calculation) to generate weighted input, and using adders (mathematical operators) to perform addition (mathematical calculation) to generate and provide target output.  
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP

N processing elements comprises” and “utilizing one processing element designated for...wherein said utilizing said one processing element designated for”, as drafted, is/are reciting generic computer component(s). The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Further, the additional elements of “clock cycle[s]” as drafted, are reciting generic computer component(s) because generic computer processors implement timing/clock cycles for operations. Applying calculations onto a generic processor using generic clock cycles amounts to mere instructions to apply the calculation (abstract idea) on a computer. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 7,
Claim 7 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 7 is directed to a method for fast weighted sum calculation in neural networks, which can be directed to a process, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis:
wherein M corresponds to a power-of-2 integer and the plurality of adders corresponds to (M-1) adders arranged in a binary-tree fashion to add the M weighted inputs to generate said one target output.
as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the generic computer components language (“utilizing N processing elements...wherein said utilizing the
N processing elements comprises”, “utilizing one processing element designated for...wherein said utilizing said one processing element designated for”, and “in one...clock cycle plus a plurality of...clock cycles”). The above limitations in the context of this claim encompass the recited abstract idea of a mathematical concept of “calculating a weighted sum for one target output…[and] generate all N weighted sums in one multiplication [calculation/operation] plus a plurality of addition [calculations/operations].” In particular, the calculation of the weighted sum involves using M multipliers (mathematical operators) to perform multiplication (mathematical calculation) where in M is a power-of-2 integer and the adders (mathematical operators) are arranged in a binary-tree fashion to perform addition (mathematical calculation) to generate target output.  
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element(s) of “utilizing N processing elements...wherein said utilizing the
N processing elements comprises” and “utilizing one processing element designated for...wherein said utilizing said one processing element designated for”, as drafted, is/are reciting generic computer component(s). The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Further, 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 8,
Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 8 is directed to a method for fast weighted sum calculation in neural networks, which can be directed to a process, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: Each of the following limitations in claim 6, as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the generic computer components language (“utilizing N processing elements...wherein said utilizing the N processing elements comprises”, “utilizing one processing element designated for...wherein said utilizing said one processing element designated for”, and “in one...clock cycle plus a plurality of...clock cycles”) and the extra-solution language (“timing and control circuitry to coordinate systolic operations for the M multipliers and the plurality of adders”). 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element(s) of “utilizing N processing elements...wherein said utilizing the
N processing elements comprises” and “utilizing one processing element designated for...wherein said utilizing said one processing element designated for”, as drafted, is/are reciting generic computer component(s). The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Further, the additional elements of “clock cycle[s]” as drafted, are reciting generic computer component(s) because generic computer processors implement timing/clock cycles for operations. Applying calculations onto a generic processor using generic clock cycles amounts to mere instructions to apply the calculation (abstract idea) on a computer. Finally, the recitation of “timing and control circuitry to coordinate systolic operations for the M multipliers and the plurality of adders” amounts to an insignificant extra-solution activity because under broadest reasonable interpretation, this limitation can be construed as referring to a pre-solution or post-solution coordination of operations of the multipliers and adders. Since the recitation of “timing and control circuitry to coordinate systolic operations for the 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Finally, the insignificant extra-solution activity of “timing and control circuitry to coordinate systolic operations for the M multipliers and the plurality of adders” is well‐understood, routine, and conventional. MPEP 2106.05(d) notes that “[a] factual determination is required to support a conclusion that an additional element (or combination of additional elements) is well-understood, routine, conventional activity. Berkheimer v. HP, Inc., 881 F.3d 1360, 1368, 125 USPQ2d 1649, 1654 (Fed. Cir. 2018).” Demirsoy et al. (US 8,458,243 B1) in Fig. 2 and Col. 4 lines 6-16: “FIG. 2 shows typical circuitry 800 for implementing (in so-called "direct form") the FIR filter equation shown in FIG. 1. In this FIG., elements 820-1 through 820-(k-1) are a series of delay circuit elements (e.g., registers or flip-flops), each of which delays the input sample applied to it by one operating cycle of the circuitry. (As noted earlier, such an "operating cycle" is typically the time duration of each successive sample x[n] in the input sample stream. This "time duration" is also typically the period (length in time) of one cycle in a clock signal that controls the timing of various events throughout the circuitry” (emphasis added) disclose that it is typical (conventional) to include delay circuit elements (correspond to timing and control circuitry) that coordinate systolic operations of multipliers and adders (see the typical circuitry in Fig. 2). The claim is not patent eligible.
Regarding Claim 9,
Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 9 is directed to a method for fast weighted sum calculation in neural networks, which can be directed to a process, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: Each of the following limitations in claim 6, as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the generic computer components language (“utilizing N processing elements...wherein said utilizing the N processing elements comprises”, “utilizing one processing element designated for...wherein said utilizing said one processing element designated for”, and “in one...clock cycle plus a plurality of...clock cycles”) and the extra-solution language (“a buffer to store the M weights.”). The above limitations in claim 6 encompass the recited abstract idea of a mathematical concept of “calculating a weighted sum for one target output…[and] generate all N weighted sums in one multiplication [calculation/operation] plus a plurality of addition [calculations/operations].” In particular, the calculation of the weighted sum involves using M multipliers (mathematical operators) to perform multiplication (mathematical calculation) to generate weighted input, and using adders (mathematical operators) to perform addition (mathematical calculation) to generate and provide target output.  
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element(s) of “utilizing N processing elements...wherein said utilizing the
N processing elements comprises” and “utilizing one processing element designated for...wherein said utilizing said one processing element designated for”, as drafted, is/are reciting generic computer 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Furthermore, according to MPEP 2106.05(d)(II), “The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity...iv. Storing and retrieving information in memory”. Therefore, the insignificant extra-solution activity of “a buffer to store the M weights” is well‐understood, routine, and conventional and cannot amount to significantly more. The claim is not patent eligible.
Regarding Claim 10,
Claim 10 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 10 is directed to a method for fast weighted sum calculation in neural networks, which can be directed to a process, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: Each of the following limitations in claim 6, as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the generic computer components language (“utilizing N processing elements...wherein said utilizing the N processing elements comprises”, “utilizing one processing element designated for...wherein said utilizing said one processing element designated for”, and “in one...clock cycle plus a plurality of...clock cycles”) and extra-solution activity language (“wherein the M weights are provided to each processing element externally.”). The above limitations in claim 6 encompass the recited abstract idea of a mathematical concept of “calculating a weighted sum for one target output…[and] generate all N weighted sums in one multiplication [calculation/operation] plus a plurality of addition [calculations/operations].” In particular, the calculation of the weighted sum involves using M multipliers (mathematical operators) to perform multiplication (mathematical calculation) to generate weighted input, and using adders (mathematical operators) to perform addition (mathematical calculation) to generate target output.  
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element(s) of “utilizing N processing elements...wherein said utilizing the N processing elements comprises”, “utilizing one processing element designated for...wherein said utilizing said one processing element designated for”, as drafted, is/are reciting generic computer 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Furthermore, according to MPEP 2106.05(d)(II), “The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity...iv. Storing and retrieving information in memory”. Therefore, the 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 4-7, 9, and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (“DLAU: A Scalable Deep Learning Accelerator Unit on FPGA”) in view of Bittner et al. (US 2018/0157465 A1).
Regarding Claim 1,
Wang et al. teaches A computing device for fast weighted sum calculation in neural networks having M inputs and N output, wherein M and N are integer greater than 1, the computing device comprising (pg. 513 third full paragraph: “we present a scalable deep learning accelerator unit named DLAU to speed up the kernel computational parts of deep learning algorithms” teaches a deep learning accelerator unit on a FPGA to speed up neural network computations, which Fig. 2 teaches includes 
...each processing element comprises (Fig. 2 teaches one TMMU, which is a tiled matrix multiplication unit):
M multipliers coupled to M inputs and M weights respectively, wherein the M weights are associated with the M inputs and said one target output (Fig. 2 and pg. 515 fourth full paragraph: “TMMU is in charge of multiplication and accumulation operations. TMMU is specially designed to exploit the data locality of the weights and is responsible for calculating the part sums... Fig. 2 illustrates the TMMU schematic diagram, in which we set tile size = 32 as an example” teach M multipliers, represent by “X”, coupled to M inputs and weights (for example, 32) wherein the M weights are associated with the M inputs and one target output, which is the output being placed into the buffer in the right), 
and wherein each of the M multiplier performs multiplication of one input with one weight to generate one weighted input, and the M multipliers generate M weighted inputs (Fig. 2 teaches each of the M multipliers performs multiplication of an input (for example, Ni1) and weight (for example, W1j) to generate one weighted input in which M multipliers generate M weighted inputs); and 
a plurality of adders arranged to add the M weighted inputs to generate said one target output (Fig. 2 and pg. 515 fifth full paragraph: “we use pipelined binary adder tree structure to optimize the performance” teach a plurality of adders, as represented by “+”, arranged to add the M weighted inputs toe generate the output being placed into the buffer in the right).
Wang et al. does not appear to explicitly teach N processing elements with each processing element designated for calculating a weighted sum for one target output, wherein the N processing elements generate all N weighted sums in one multiplication clock cycle plus a plurality of addition clock cycles and.
Bittner et al. teaches N processing elements with each processing element designated for calculating a weighted sum for one target output, wherein the N processing elements generate all N weighted sums in one multiplication clock cycle plus a plurality of addition clock cycles and (pg. 9 [0077]: “a colunm of 16 DSP units (MxV) which form the systolic array column” and pg. 9 [0078]: “In some examples of the disclosed technology, an MxV systolic array multiplier column 432 is implemented by configuring an Altera FPGA DSP primitive shown in FIG. 5. DSP units are logically arranged as a vertical column so that the weight data W can pass horizontally across the column as the input data is passed vertically down the column. Each DSP unit multiplies two pairs of 16-bit mantissas on each clock cycle and simultaneously adds the results of the previous clock cycle's pair of multiplies to a 64-bit accumulator 530” teach there are N=16 DSP units (Digital Signal Processors; correspond to processing elements) with each DSP unit designated to calculate a weighted sum for one target output (see Fig. 4 and Fig. 5) wherein the 16 DSP units perform the multiplication to generate all N weighted sums in one multiplication clock cycle and performs additions in a plurality of addition clock cycles (result from each clock cycle is added to results of previous clock cycle)).
Wang et al. and Bittner et al. are analogous art to the claimed invention because they are directed to implementation of neural network calculations.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate N processing elements with each processing element designated for calculating a weighted sum for one target output, wherein the N processing elements generate all N weighted sums in one multiplication clock cycle plus a plurality of addition clock cycles and as taught by Bittner et al. to the disclosed invention of Wang et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[match] the native external DDR memory bandwidth of the target FPGA” by performing the following: “[e]ach DSP unit multiplies two pairs of 16-bit mantissas on each clock cycle and simultaneously adds 
Regarding Claim 2,
Wang et al. in view of Bittner et al. teaches the computing device of Claim 1.
Wang et al. further teaches wherein M corresponds to a power-of-2 integer and the plurality of adders corresponds to (M-1) adders arranged in a binary-tree fashion to add the M weighted inputs to generate said one target output (Fig. 2 and pg. 515 fifth full paragraph: “we use pipelined binary adder tree structure to optimize the performance” teach M = 32, which is a power-of-2 integer (because log2(32)=5), and the adders are arranged in a binary-tree fashion to add the M weighted outputs; the adders arranged in a binary-tree fashion renders that there is M-1 adders because the results of each level of adders are then added again: for example, M=32 inputs mean that there are 31 adders and M=8 inputs mean that there are 7 adders).
Regarding Claim 4,
Wang et al. in view of Bittner et al. teaches the computing device of Claim 1.
Wang et al. further teaches wherein each processing element further comprises a buffer to store the M weights (pg. 515 fourth full paragraph: “TMMU employs an input FIFO buffer which receives the data transferred from DMA and an output FIFO buffer to send part sums to PSAU. Fig. 2 illustrates the TMMU schematic diagram, in which we set tile size = 32 as an example. TMMU first reads the weight matrix data from input buffer into different BRAMs in 32 by the row number of the weight matrix” teaches the tiled matrix multiplication unit (TMMU) comprises an input buffer stores the M weights and place the weights into BRAMs).
Regarding Claim 5,
Wang et al. in view of Bittner et al. teaches the computing device of Claim 1.
Wang et al. further teaches wherein the M weights are provided to each processing element externally (Fig. 1 and pg. 515 third full paragraph: “TMMU is the primary computational unit, which reads the total weights and tiled nodes data through DMA, performs the calculations, and then transfers the intermediate part sum results to PSAU” teach the weights are provided to the TMMU (processing element) by the DMA (which is external to the TMMU), thus the weights data are provided to the TMMU externally).
Regarding Claim 6,
Wang et al. teaches A method for fast weighted sum calculation in neural networks having M inputs and N output, wherein M and N are integer greater than 1, the method comprising (pg. 513 third full paragraph: “we present a scalable deep learning accelerator unit named DLAU to speed up the kernel computational parts of deep learning algorithms” teaches a deep learning accelerator unit on a FPGA to speed up neural network computations, which Fig. 2 teaches includes weighted sum calculation method in which the network has M inputs (such as 32 in Fig. 2) and N outputs (such as 32 outputs in Fig. 2 after the multiplication operations)):
...utilizing one processing element designated for calculating a weighted sum for one target output, wherein said utilizing said one processing element designated for calculating a weighted sum for one target output comprises (Fig. 2 teaches a TMMU (tiled matrix multiplication unit, corresponds to one processing element), which calculates a weighted sum for one target output):
multiplying M inputs and M weights respectively using M multipliers in said one processing element to generate M weighted inputs for said one target output, wherein the M weights are associated with the M inputs and said one target output (Fig. 2 and pg. 515 fourth full paragraph: “TMMU is in charge of multiplication and accumulation operations. TMMU is specially designed to exploit the data locality of the weights and is responsible for calculating the part sums... Fig. 2 illustrates the TMMU schematic diagram, in which we set tile size = 32 as an example” teach that in Fig. 2 teaches each of the M multipliers performs multiplication of an input (for example, Ni1) and weight (for example, W1j) to generate one weighted input in which M multipliers generate M weighted inputs); 
adding the M weighted inputs to generate said one target output using a plurality of adders in said one processing element; and providing said one target output (Fig. 2 and pg. 515 fifth full paragraph: “we use pipelined binary adder tree structure to optimize the performance” teach a plurality of adders, as represented by “+”, arranged to add the M weighted inputs toe generate the output being placed into (provided to) the buffer in the right).
Wang et al. does not appear to explicitly teach utilizing N processing elements to calculate weighted sums for the all N outputs in one multiplication clock cycle plus a plurality of addition clock cycles and, wherein said utilizing the N processing elements comprises.
However, Bittner et al. teaches utilizing N processing elements to calculate weighted sums for the all N outputs in one multiplication clock cycle plus a plurality of addition clock cycles and, wherein said utilizing the N processing elements comprises (pg. 9 [0077]: “a colunm of 16 DSP units (MxV) which form the systolic array column” and pg. 9 [0078]: “In some examples of the disclosed technology, an MxV systolic array multiplier column 432 is implemented by configuring an Altera FPGA DSP primitive shown in FIG. 5. DSP units are logically arranged as a vertical column so that the weight data W can pass horizontally across the column as the input data is passed vertically down the column. Each DSP unit multiplies two pairs of 16-bit mantissas on each clock cycle and simultaneously adds the results of the previous clock cycle's pair of multiplies to a 64-bit accumulator 530” teach there are N=16 DSP units (Digital Signal Processors; correspond to processing elements) with each DSP unit designated to calculate a weighted sum (see Fig. 4 and Fig. 5) wherein the 16 DSP units perform the multiplication to 
Wang et al. and Bittner et al. are analogous art to the claimed invention because they are directed to implementation of neural network calculations.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate utilizing N processing elements to calculate weighted sums for the all N outputs in one multiplication clock cycle plus a plurality of addition clock cycles and, wherein said utilizing the N processing elements comprises as taught by Bittner et al. to the disclosed invention of Wang et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[match] the native external DDR memory bandwidth of the target FPGA” by performing the following: “[e]ach DSP unit multiplies two pairs of 16-bit mantissas on each clock cycle and simultaneously adds the results of the previous clock cycle's pair of multiplies to a 64-bit accumulator” (Bittner et al. pg. 9 [0078]).
Regarding Claim 7,
Wang et al. in view of Bittner et al. teaches the method of Claim 6.
Wang et al. further teaches wherein M corresponds to a power-of-2 integer and the plurality of adders corresponds to (M-1) adders arranged in a binary-tree fashion to add the M weighted inputs to generate said one target output. (Fig. 2 and pg. 515 fifth full paragraph: “we use pipelined binary adder tree structure to optimize the performance” teach M = 32, which is a power-of-2 integer (because log2(32)=5), and the adders are arranged in a binary-tree fashion to add the M weighted outputs; the adders arranged in a binary-tree fashion renders that there is M-1 adders because the results of each level of adders are then added again: for example, M=32 inputs mean that there are 31 adders and M=8 inputs mean that there are 7 adders).
Regarding Claim 9,
Wang et al. in view of Bittner et al. teaches the method of Claim 6.
Wang et al. further teaches wherein each processing element further comprises a buffer to store the M weights (pg. 515 fourth full paragraph: “TMMU employs an input FIFO buffer which receives the data transferred from DMA and an output FIFO buffer to send part sums to PSAU. Fig. 2 illustrates the TMMU schematic diagram, in which we set tile size = 32 as an example. TMMU first reads the weight matrix data from input buffer into different BRAMs in 32 by the row number of the weight matrix” teaches the tiled matrix multiplication unit (TMMU) comprises an input buffer stores the M weights and place the weights into BRAMs).
Regarding Claim 10,
Wang et al. in view of Bittner et al. teaches the method of Claim 6.
Wang et al. further teaches wherein the M weights are provided to each processing element externally (Fig. 1 and pg. 515 third full paragraph: “TMMU is the primary computational unit, which reads the total weights and tiled nodes data through DMA, performs the calculations, and then transfers the intermediate part sum results to PSAU” teach the weights are provided to the TMMU (processing element) by the DMA (which is external to the TMMU), thus the weights data are provided to the TMMU externally).

Claims 3 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (“DLAU: A Scalable Deep Learning Accelerator Unit on FPGA”) in view of Bittner et al. (US 2018/0157465 A1) and further in view of Fraser et al. (US 10,839,286 B2).
Regarding Claim 3,
Wang et al. in view of Bittner et al. teaches the computing device of Claim 1.
Wang et al. in view of Bittner et al. does not appear to explicitly teach wherein each processing element further comprises timing and control circuitry to coordinate systolic operations for the M multipliers and the plurality of adders.
However, Fraser et al. teaches wherein each processing element further comprises timing and control circuitry to coordinate systolic operations for the M multipliers and the plurality of adders (Col. 9 lines 52-64:“Referring to FIG. 2, illustrated is an exemplary neural network system 200. The neural network system 200 includes a preprocessing unit 202, N layers including layers 204-0 through 204-(N−1), and a loss computation unit 210. The layers may be implemented using programmable logic devices, DSP blocks, etc. In an example, the neural network system 200 including the N layers may be implemented in a single IC. In another example, multiple ICs may be used to implement the neural network system 200, where each IC may implement one or more layers of the neural network system 200. Each layer may include one or more neurons, each neuron having its corresponding forward path processing element (PE) and backward path PE” teaches that each layer has multiple neurons and each neuron has its corresponding processing element with its functions implemented by integrated circuits (ICs); Col. 14 lines 56-65: “Referring to FIG. 7, illustrated is a portion of a layer 204-i implementing delayed model adaptation. In the example of FIG. 7, at a particular time to, a weight gradient function unit 702 receives gradients 208-(i+1) Gradi+1(b−d) for the (b−d)th batch from the next layer 204(i+1). The weight gradient function unit 702 also receives delayed activation 704 from a delay unit 706. The delay unit 704 may generate the delayed activation 704 by applying d periods of delay (e.g., associated with d batches) to activations 206-(i−1) Acti−1(b) for the bth batch” teaches that the processing element includes a delay unit which controls timing of activations, thus rendering the delay unit to correspond to a timing and control circuitry that coordinates systolic operations (see Col. 14 lines 1-7); Col. 15 lines 22-24: “The gradients 722 and weights 714 are sent to a neuron gradient function unit 724 (e.g., a multiply and accumulate unit 506 of FIG. 5)” teaches that the multiply and accumulate Fig. 6 teaches there are multiple gradient calculations corresponding to multiply layers, thus rendering there are multiple multiply and accumulate operations).
Wang et al., Bittner et al., and Fraser et al. are analogous art to the claimed invention because they are directed to implementation of neural network calculations.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein each processing element further comprises timing and control circuitry to coordinate systolic operations for the M multipliers and the plurality of adders as taught by Fraser et al. to the disclosed invention of Wang et al. in view of Bittner et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage “a delayed model adaptation scheme to remove the dependencies between the layers, thereby enabling the efficient usage of multiple accelerators (e.g., multiple GPUs, multiple FPGAs, a single FPGA including multiple systolic arrays) for implementing the neural network system” (Fraser et al. Col. 14 lines 1-7).
Regarding Claim 8,
Wang et al. in view of Bittner et al. teaches the method of Claim 6.
Wang et al. in view of Bittner et al. does not appear to explicitly teach wherein each processing element further comprises timing and control circuitry to coordinate systolic operations for the M multipliers and the plurality of adders.
However, Fraser et al. teaches wherein each processing element further comprises timing and control circuitry to coordinate systolic operations for the M multipliers and the plurality of adders (Col. 9 lines 52-64:“Referring to FIG. 2, illustrated is an exemplary neural network system 200. The neural network system 200 includes a preprocessing unit 202, N layers including layers 204-0 through 204-(N−1), and a loss computation unit 210. The layers may be implemented using programmable logic devices, DSP blocks, etc. In an example, the neural network system 200 including the N layers may be implemented in a single IC. In another example, multiple ICs may be used to implement the neural network system 200, where each IC may implement one or more layers of the neural network system 200. Each layer may include one or more neurons, each neuron having its corresponding forward path processing element (PE) and backward path PE” teaches that each layer has multiple neurons and each neuron has its corresponding processing element with its functions implemented by integrated circuits (ICs); Col. 14 lines 56-65: “Referring to FIG. 7, illustrated is a portion of a layer 204-i implementing delayed model adaptation. In the example of FIG. 7, at a particular time to, a weight gradient function unit 702 receives gradients 208-(i+1) Gradi+1(b−d) for the (b−d)th batch from the next layer 204(i+1). The weight gradient function unit 702 also receives delayed activation 704 from a delay unit 706. The delay unit 704 may generate the delayed activation 704 by applying d periods of delay (e.g., associated with d batches) to activations 206-(i−1) Acti−1(b) for the bth batch” teaches that the processing element includes a delay unit which controls timing of activations, thus rendering the delay unit to correspond to a timing and control circuitry that coordinates systolic operations (see Col. 14 lines 1-7); Col. 15 lines 22-24: “The gradients 722 and weights 714 are sent to a neuron gradient function unit 724 (e.g., a multiply and accumulate unit 506 of FIG. 5)” teaches that the multiply and accumulate unit implements neuron gradient function; Fig. 6 teaches there are multiple gradient calculations corresponding to multiply layers, thus rendering there are multiple multiply and accumulate operations).
Wang et al., Bittner et al., and Fraser et al. are analogous art to the claimed invention because they are directed to implementation of neural network calculations.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein each processing element further comprises timing and 
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage “a delayed model adaptation scheme to remove the dependencies between the layers, thereby enabling the efficient usage of multiple accelerators (e.g., multiple GPUs, multiple FPGAs, a single FPGA including multiple systolic arrays) for implementing the neural network system” (Fraser et al. Col. 14 lines 1-7).

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Huang et al. (US 10,585,621 B2) teaches a systolic array implemented in circuitry of an integrated circuit with statically-schedulable feed and drain structure.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YING YU CHEN whose telephone number is (571)270-1484.  The examiner can normally be reached on Monday-Friday 7:30 am-5:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/YING YU CHEN/               Examiner, Art Unit 2125