DETAILED ACTION
This Final Office Action is responsive to Applicant’s Amendment filed on 01/27/2022 in which claims 1-7 and 9-23 were amended.
Claims 1-23 are currently pending and under examination, of which claims 1 and 13 are independent claims. No claims are currently in condition for allowance.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
As required by M.P.E.P. 609(c), the applicant’s submissions of the Information Disclosure Statement dated 01/27/2022 is acknowledged by the examiner and the cited references have been considered in the examination of the claims now pending. As required by M.P.E.P. 609 C(2), a copy of the PTOL-1449 initialed and dated by the examiner is attached to the instant office action.

Priority
Acknowledgment is made of applicant's claim for foreign priority based on an application filed in China on 04/15/2016. It is noted, however, that applicant has not filed a certified copy of the foreign applications as required by 37 CFR 1.55.
Should applicant desire to obtain the benefit of foreign priority under 35 U.S.C. 119(a)-(d) prior to declaration of an interference, a certified English translation of the foreign application must be submitted in reply to this action.  37 CFR 41.154(b) and 41.202(e). Failure to provide a certified translation may result in no benefit being accorded for the non-English application.

Claim Objections
Claims 1 and 13 are objected to because of the following informalities: Amendment of independent claims recites “wherein the mast computation circuit” which should read “wherein the master computation circuit”.  Appropriate correction is required.

Response to Arguments
Examiner initially notes supplemental response which appears to include a redundant claim-set. The claim-sets of 01/27/2022 and 01/13/2022, both after non-final, recite identical subject matter.
Statutory Double Patenting (identical claim-sets) over parent case 16/093,958 is withdrawn necessitated by abandonment of application 16/093,958.
Objections to the specification are withdrawn as necessitated by the amendments.
Claim interpretation under 35 U.S.C. 112(f) is withdrawn subsequent to applicant’s amendment which specifies modules as circuits. As applicant notes, the courts have indicated that “circuit” is a structural term which has been found not to invoke 35 U.S.C. 112(f) pursuant to Mass Inst. Of Tech., 462 F.3d at 1355-1256, 80 USPQ2d at 1332.
Antecedent basis rejection under 35 U.S.C. 112(b) for claims 2 and 5-6 is withdrawn as necessitated by amendment.
Improper dependence rejection under 35 U.S.C. 112(d) for claim 21 is withdrawn as necessitated by amendment.
The rejection of claims 1-12 under 35 U.S.C. 101 as being directed software per se is withdrawn as necessitated by amendment.
The rejection of claims 1-23 under 35 U.S.C. 101 as being directed to an abstract idea without significantly more is herein maintained. Applicant’s remarks 01/27/2022 have been fully considered and are not persuasive.
Pitney Bowes, Inc. v. Hewlett-Packard Co., 182 F.3d 1298, 1305, 51 USPQ2d 1161, 1165 (Fed. Cir. 1999). See MPEP § 2111.02.
Applicant further argues under prong two that the claimed invention improves the functioning of neural computers through efficiency so as to render a practical application. Examiner respectfully disagrees. The efficiency (improvement) noted by applicant points to bitwise operations being more efficient than multiplication or dot product calculations. However, bitwise operation is not recited in the claim anywhere and improvement must have a direct nexus with claim language. Efficiency is not found to be mentioned by the specification and no benchmarking results are identified as unexpected results. In view of the foregoing, the arguments over subject matter eligibility are not persuasive and the rejection is maintained.
Applicant’s amendments to independent claims 1 and 13 with remarks dated 01/27/2022 regarding the prior art rejections under 35 U.S.C. 102 and 35 U.S.C. 103 have been considered. Updated search and consideration is given in light of the present claim status and additional art is identified.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-23 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. In determining whether the claims are subject matter eligible, the examiner applies the guidance set forth under MPEP 2106. The response to remarks above is incorporated herein.
Step 1: Is the claim to a process, machine, manufacture, or composition of matter? Yes – all claims fall within one of the four statutory categories: claims 1-12 recite an apparatus/article of manufacture, and claims 13-23 recite a method/process.
Step 2A, prong one: Does the claim recite an abstract idea, law of nature or natural phenomenon? Yes - the claims are directed to the abstract idea being mathematical calculations. In particular, limitations recite: 
Claim 1) 
“receive one or more groups of MNN data, wherein the one or more groups of MNN data include input data… presented as discrete values” (observation)
“calculate an input gradient vector based on a first output gradient vector… to calculate the input gradient vector” (mathematical calculation)
“parallely calculate portions of a second output vector base on the input gradient vector calculated…” (mathematical calculation)
“decode an instruction that initiates backpropagation” (mathematical calculation)
Claim 13) 
“receiving… one or more groups of MNN data, wherein the one or more groups of MNN data include input data and one or more weight values, and wherein at least a portion of the input data and the weight values are presented as discrete values” (observation)
“calculating… an input gradient vector based on a first output gradient vector… calculate the input gradient vector” (mathematical calculation)
“parallelly calculating… portions of a second output vector based on the input gradient vector calculated…” (mathematical calculation)
The steps of “calculating” and “parallelly calculating” are the mathematical calculations based on gradient vector analysis of received data comprising discrete values. For example, see Equations of specification [0048], [0074], [0078], [0082]. Mathematical calculations are specifically enumerated as one of the groupings of abstract ideas.
Step 2A, prong two: Does the claim recite additional elements that integrate the judicial exception into a practical application? No - the judicial exception is not integrated into a practical application. Although the claim recites that the functionality is performed by circuits, these elements are recited at a high-level of generality such that it amounts to no more than a mere instructions to apply the exception, see MPEP 2106.05(f). The specification merely notes per [0033] “Any of the above-mentioned components or devices may be implemented by a hardware circuit” and the limitations do no positively recite a neural accelerator to integrate calculations into a framework beyond the performance of calculations among a plurality of disparate circuits. Accordingly, the hardware is not considered sufficient to be special purpose. Further, the step of receiving data (discrete or otherwise) 
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? No—as noted above, the only limitation on the performance of the described method is that it must be performed by circuits. The claim thus recites computing components only at a high-level of generality such that it amounts to no more than mere instructions to apply the exception. The calculation of gradient/vector based on received data may be implemented traditionally such as by software or common graphing calculators of the 1990s, e.g., Ti-85 or Ti-89. Additionally, recitation of master/slave for parallel calculation does not provide meaningful limitation as such process has long been known, see for example Aliaga et al., “SoC-Based Implementation of the Backpropagation Algorithm for MLP” [P.746] Fig 1 circa 2008. Taken alone, additional elements do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the FlexSim Software Products, Inc. v. Simio, LLC. (Fed. Cir. 2020).
For the reasons above, claim 13 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to dependent claims 14-23. 
Dependent claims 14-16, 19, and 23 disclose additional calculation steps which is considered part of the abstract idea. For example, claim 14 combining is summing (sigma), claim 16 multiplying is product (commonly pi as used in product of sums), derivative of activation function is clearly math along with claim 19 arithmetic logical operations and claim 23 distance calculation, comparison (operands < or >) and clipping is truncating or magnitude. Claim 17 discloses a data structure being nodal which is considered field of use per MPEP 2106.05(h). Such field of use is decision trees such as gbdt, grad boost decision tree or random forest. Claim 18 discloses cache and execution control upon instruction conflict which is noted as post-solutionary per MPEP 2106.05(g). There is nothing to indicate that “conflict exists between the instruction and the other instruction” is anything more than an operating system crash or stack overflow. This substantially repeats in claim 19 vis-à-vis operation conflict of instructions. Claims 20-22 disclose processors as hybrid, discrete, or continuous for operating on data types. The processor being designated hybrid, discrete, or continuous is descriptive as processor may be named Alice, Bob, or Charlie as determined by some module without resolving whether computation involves floating or fixed point values. 
Taken alone, their additional elements do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-3 are rejected under 35 U.S.C. 103 as being unpatentable over: 
Soudry et al., “Memristor-Based Multilayer Neural Networks with Online Gradient Descent Training”, hereinafter Soudry, in view of 
Tanomoto et al., “A CGRA-based Approach for Accelerating Convolutional Neural Networks”, hereinafter Tanomoto, in view of 
El-Yaniv et al., US Patent 10,831,444B2 (co-invented by Soudry) as evidenced by US Provisional 62/317,665 which includes arXiv:1602.02505v2 (see PTO-892, attached as NPL).
With respect to claim 1, Soudry teaches: 
An apparatus for backpropagation of a multilayer neural network (MNN) {Soudry [P.2412] Fig 4, [P.2414 ¶1] “the proposed circuit can be used to implement backpropagation on a general MNN” per Title, multilayer neural networks}, comprising: 
a computation circuit configured to receive one or more groups of MNN data {Soudry [P.2409 ¶6] “hardware MNN circuit” illustrated Figs 4, 1-2 which [P.2416 ¶2] “receives at each trial two vector inputs x and y” and/or [P.2410 ¶4] “training set… test set”}, 
wherein the one or more groups of MNN data include input data and one or more weight values {Soudry [P.2413 Sect.C] “weights are incremented according to the update rule” per Eq.28 and weights are based on inputs x and y per [P.2410 RtCol] Wnm(k)}, 
wherein at least a portion of the input data and the weight values are presented as discrete values, and {Soudry [P.2410 Sect.B ¶1] “system operates on K discrete presentations of inputs (trials), indexed by k = 1, 2, K” again at [P.2411 Sect.A ¶2], [P.2416 Sect.A ¶1]}
wherein the computation circuit includes: 
a master computation circuit configured to calculate an input gradient vector based on a first output gradient vector from an adjacent layer and based on a data type of each of the one or more groups of MNN data, and {Soudry [P.2410 RtCol] Eq.8 SGD is gradient ∇ calculation for circuit Fig 4, hence [P.2408 Sect.1 ¶2] “backpropagation stems from the chain rule used to calculate the gradients”. As direction of reverse pass moves back through adj layers Fig 4, data types are subject to activations, e.g., sigmoid σ or σ’ derivative thereof}
one or more slave computation circuits configured to parallelly calculate portions of a second output vector based on the input gradient vector calculated by the master computation circuit and based on the data type of each of the one or more groups of MNN data; and {Soudry [P.2410 RtCol] “obtain the outer product” per Eqs. 9/10 is parallel calculation which generalizes at conclusion. Eq.10 is further detailed as local operation which indicates slave process. See also, [Abstract] “massive parallelism… performing these update operations simultaneously (incremental outer products)”} 
a controller circuit configured to decode an instruction that initiates a backpropagation process and transmit the decoded instruction to the computation circuit. {Tanomoto [P.77-78] Figs 10/11 “pseudocode of backward propagation” illustrates loop instruction to initiate backprop and with error as a decoding. Further, [P.73 Sect.1 ¶4] “CGRA is based on the dataflow computing paradigm, so that it has a reconfigurable pipeline of massive processing elements (PEs)” illustrated Figs 6-7 with distributed/parallel memory communication bus for transmitting}
Tanomoto is directed to -neural network backpropagation thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to utilize the backpropagation instruction of Tanomoto -in combination with the gated architecture for neural network training disclosed by Soudry “in order to obtain higher accuracy by using the previous recognition result” (Tanomoto [P.75 ¶1]) and/or for the stated “advantage of our system is the higher memory bandwidth utilization. Reuse of data is realized by utilizing on-chip scratchpad memory blocks on the CGRA, so that memory bandwidth pressure to the external memory is significantly reduced” (Tanomoto [P.73 Last¶], [P.78 ¶2]). Further, CGRA offers benefit over GPU, FPGA, and ASIC as it is more customizable, “state-of-the-art deep learning algorithms are rapidly evolving. Therefore the programmability of the computing platform is desirable and important” (Tanomoto [P.73 Sect.1 ¶3], [P.76 Sect.C ¶1]).
	However, the combination of Soudry and Tanomoto does not disclose per amendment.
	El-Yaniv teaches: 
wherein the master computation circuit is further configured to select one or more operations corresponding to each of the discrete values and perform the one or more selected operations to calculate the input gradient vector {El-Yaniv [Col3 Lines5-18] “bitwise operations are XNOR” operation is bitwise selection via XNOR. Discrete values are quantized values. The gradient partial derivative calculation which is backpropagated through layers is per Equation at [Col12 Line25]. See also xnor/xnot [Col13 Lines8-15] as well as left and right bit-shift with gradient clipping [Col11]}.
	El-Yaniv is directed to multi-layer neural network backpropagation with gradient calculation thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to further detail Soudry according the El-Yaniv (co-authored by same Soudry) being an extension of the original work for the motivation that “The power efficiency is improved by more than one order of magnitude” (El-Yaniv [Col14 Line12]). See Fig 4 where efficiency is specifically resolved in terms of pico-joules and which saves MAC operations (El-Yaniv [Col6 Lines1-2]).

With respect to claim 2, the combination of Soudry, Tanomoto and El-Yaniv teaches the apparatus of claim 1, further comprising 
	an interconnection circuit configured to combine 31the portions of the second output gradient vector to generate the second output gradient vector. {Soudry [P.2414] Eqs. 33-35 Sigma ∑ is combining with variable y which computes gradient ∇ per Eq. 36. Further, [P.2410] Eq.8 details SGD as being iterative, i.e., second, third, nth and so on}

With respect to claim 3, the combination of Soudry, Tanomoto and El-Yaniv teaches the apparatus of claim 1, wherein the slave computation circuits are further configured to: 
parallelly calculate gradients of weight values based on the input gradient vector; and update the weight values based on the respectively calculated gradients. {Soudry [P.2414-15 PgBrk] Eq.28 

Claims 4 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Soudry, Tanomoto and El-Yaniv in view of
Lin et al., “Neural Networks with Few Multiplications”, hereinafter Lin.
With respect to claim 4, the combination of Soudry, Tanomoto and El-Yaniv teaches the apparatus of claim 1. Lin teaches wherein 
	the master computation circuit is further configured to respectively multiply each element of the input gradient vector with a derivative of an activation function of the adjacent layer. {Lin [P.3] Eqs. 4-6 where “The operator ʘ stands for element-wise multiply” whereby [P.2 Sect.4 ¶3] “h’(Wx+b) contains down-flowing gradients” and [P.2 Sect3.1 ¶2] “h is the activation function” for which prime is derivative}
	Lin is directed neural network backpropagation thus being analogous. A person having ordinary skill in the art would have considered it obvious to implement Lin’s element-wise multiplication as a substituted variant of Soudry’s component-wise multiplication as applying known techniques to a known method to yield predictable results and/or because “multiplications left is negligible in quantized back propagation… dramatically decreasing multiplications does not necessarily entail a loss in performance” (Lin [P.4 ¶1]). 

With respect to claim 12, the combination of Soudry, Tanomoto and El-Yaniv teaches the apparatus of claim 1. Lin teaches further comprising
	a data converter configured to convert continuous data to discrete data {Lin [P.1 Sect.1 ¶2] “quantized back propagation that converts multiplications into bit-shifts” quantizing is discretization, [P.4 Alg.1]}, wherein the data converter includes: 
a preprocessing circuit configured to clip a portion of the input data that is within a predetermined range to generate preprocessed data; {Lin [P.4 Alg.1 Line10] “W <- clip (W - ΔW)”, [P.6 Sect5.3] details the effect of bit clipping as a range for weight update}
a distance calculator circuit configured to calculate multiple distance values between the preprocessed data and multiple discrete values; and a comparer circuit configured to compare the multiple distance values to output one or more of the multiple discrete values. {Lin [P.3 ¶1] “distance from w’ij, i.e., if w’ij > 0” calculated among sub-intervals [-1,0] and [0,1], Eqs. 2-3 with comparator operands of > or <=0}
	A person having ordinary skill in the art would have considered it obvious to implement Lin’s quantization, clipping and distance calculations as applying known techniques to a known method to yield predictable results and/or for the motivation of “A more efficient way is to use different values for the maximum left shift the and the maximum right shift” (Lin [P.6 ¶5]).

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Soudry, Tanomoto and El-Yaniv in view of
Andrychowicz et Kurach, “Learning Efficient Algorithms with Hierarchical Attentive Memory”, hereinafter Andrychowicz.
With respect to claim 5, the combination of Soudry and Tanomoto teaches the apparatus of claim 2. Andrychowicz teaches  
	wherein the interconnection circuit is structured as a binary tree including one or more levels, each of which includes one or more nodes, wherein each of the nodes at one level is connected to two nodes at a lower level, and wherein each of the nodes transmits same data to the two nodes at the lower level and combines data received from the two nodes at the lower level. {Andrychowicz discloses “binary tree” as hierarchical attentive memory with clearly illustrated nodal tree structure having join/combine operation per Figs 2-3 and 5-6, [P.2-4 Sect.3]}
	Andrychowicz is directed to neural network architectures thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to implement the binary tree structure of Andrychowicz in combination with Soudry for the advantage of efficient memory access both in terms of reduction in operations required and speed (Andrychowicz [P.8-9 PgBrk]).

Claims 6 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Soudry, Tanomoto and El-Yaniv in view of
Kim et al., “Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications”, hereinafter Kim. 
With respect to claim 6, the combination of Soudry, Tanomoto and El-Yaniv teaches the apparatus of claim 1. Kim teaches wherein the master computation module includes:  
32a master neuron caching circuit configured to cache data; a master computation subcircuit configured to receive the first output gradient vector from the interconnection circuit; and a master data dependency relationship determination circuit configured to temporarily prevent the instruction from being executed based on a determination that a conflict exists between the instruction and other instructions. {Kim discloses “reducing cache conflict” in layerwise evaluation of compressed neural networks performing backpropagation (i.e., grad vector) over parallel threads (i.e., instructions), see [P.7-8 PgBrk] Fig 2. The circuits are with regard to GPU-Titan processor and mobile smartphone. Cache Coherent Interconnect” among distributed cores, memories, and slaves-}-
	Kim is directed to neural network devices and backpropagation thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective to resolve cache conflict as disclosed by Kim for the stated benefit “improves cache efficiency” (Kim [P.9 ¶1]).

With respect to claim 9, the combination of Soudry, Tanomoto and El-Yaniv and Kim teaches the apparatus of claim 6, wherein the master computation subcircuit includes: 
an operation determiner configured to determine an operation to be performed based on the data type of the input data; and a hybrid data processor configured to perform the determined operation {Kim discloses layer types, (i.e., convolutional, fully connected, etc) corresponding to operations performed based on data type. Processor being hybrid is SoC system-on-chip Fig 7(b) or hardware illustrated Fig 6 for parallel operations}.

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Soudry, Tanomoto and El-Yaniv in view of
Lashgar et Baniasadi, “Employing Software-Managed Caches in OpenACC: Opportunities and Benefits”, hereinafter Lashgar.
With respect to claim 7, the combination of Soudry, Tanomoto and El-Yaniv teaches the apparatus of claim 1. Lashgar teaches wherein each of the slave computation modules includes: 
a slave computation subcircuit configured to receive the one or more groups of micro-instructions and to perform arithmetic logical operations; and a slave data dependency relationship determination circuit configured to perform data access operations to a slave neuron caching circuit, a weight value caching circuit, and a weight gradient caching circuit based on a determination that no conflict exists between the data access operations. {Lashgar discloses OpenACC software-managed cache with compiler directives. “Off-load” from CPU to accelerator or GPU is master/slave relation [P.5 Sect3.2 ¶2], [P.3 Sect2.2]. Memory/data access is replete, describing fcw instructions (fetch, communication, writeback). Backpropagation is noted as application per [P.5 Sec3.2]. The effect is per “software-managed cache removes cache conflict” [P.7 ¶1], [P.27 ¶4]}
	Lashgar is directed to neural network devices with backpropagation thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to implement cache control disclosed by Lashgar in order to give the user ease of interface per [Abstract] “simplify accelerator programming” and/or to “mitigate redundant memory fetches” [P.9 ¶2].

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Soudry, Tanomoto and El-Yaniv in view of
Suda et al., “Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks”, hereinafter Suda.
With respect to claim 8, the combination of Soudry, Tanomoto and El-Yaniv teaches the apparatus of claim 1. Suda teaches wherein the instruction is selected from the group consisting of 
a CONFIG instruction for configuring constants required by computation of the current layer prior to starting computation of each layer, a COMPUTE instruction for completing arithmetical logic computation of the multilayer neural network of each layer, and an IO instruction for reading in the input data required by computation from an 33external address space and storing processed data back into the external space after completion of computation. {Suda discloses OpenCL framework with HLS High Level Synthesis tools for compiling code onto reconfigurable hardware integrated with external memory and with particular application to CNN layers and MAC operations, see [P.19 Sect.4]. Example 
Suda is directed to neural network implementations thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to code instruction using OpenCL compiler instructions disclosed by Suda for implementing technique of Soudry as obvious to try among finite open source platforms of programming instructions with reasonable expectation of success. This further offers the user flexibility to “find the optimal design variables that yield maximum acceleration of any CNN model implementation using limited FPGA resources” (Suda [P.25 Conc], [P.16-17 PgBrk]).

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Soudry, Tanomoto and El-Yaniv, Kim and Lashgar.
With respect to claim 10, the combination of Soudry, Tanomoto, Kim, and Lashgar teaches the apparatus of claim 7. Kim wherein the slave computation subcircuit includes: 
an operation determiner circuit configured to determine an operation to be performed based on the data type of the input data; and a hybrid data processor configured to perform the determined operation {Identical language of claim 9 with respect to slave as opposed to master. Kim cache coherent interconnect is distributed/parallel, see [P.8 Last¶] thread level parallelism between GPU and Titan X}.

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Soudry, Tanomoto and El-Yaniv and Kim in view of
Soudry et al., “Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights”, hereinafter Soudry14, in view of
Lee, HyoukJoong, “High-Level Language Compilers for Heterogeneous Accelerators”, hereinafter Lee.
With respect to claim 11, the combination of Soudry, Tanomoto, El-Yaniv and Kim teaches the apparatus of claim 9. Soudry14 teaches wherein the master computation unit further includes: 
a data type determiner circuit configured to determine the data type of the input data; and at least one of a discrete data processor or a continuous data processor, {Soudry14 discloses “Expectation BackPropagation… values can be either continuous (i.e., real numbers) or discrete (e.g., ±1 binary)”, [Sect.4] is determining continuous or discrete. Further, [P.7, Abstract] details hardware devices and github implementation suggesting processor}. Soudry 14 is earlier work of same author Soudry.
However, Soudry does not prima facie implement heterogeneous processors. Lee teaches:
wherein the discrete data processor is configured to process the input data based on a determination that the input data is stored as discrete values, and wherein the continuous data processor is configured to process the input data based on a determination that the input data is stored as continuous values. {Lee [P.35] Fig 3.10 illustrates tree “Execute on CPU… Execute on GPU” as routing or scheduling for example, [P.44 Last¶] “Computation A is scheduled on CPU while computation B, C, and D are scheduled on GPU”}.
	Lee is directed to heterogeneous systems for neural acceleration thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to implement scheduling for heterogeneous processors as disclosed by Lee in combination with Soudry as allocating resources for optimization with heterogeneous computing elements which leads to speedup and efficiencies noted throughout Lee, e.g., benchmarking [P.28 Fig 3.5], [P.40 Fig3.14].

Claims 13-15 are rejected under 35 U.S.C. 103 as being unpatentable over Soudry in view of El-Yaniv.
With respect to claim 13, Soudry teaches: 
A method for backpropagation of a multilayer neural network (MNN) {Soudry [P.2412] Fig 4 details [P.2414 ¶1] “implement backpropagation on a general MNN” multilayer neural network per Title. [P.2419 Conc ¶1] “novel method”}, comprising: 
receiving, by a computation circuit, one or more groups of MNN data {Soudry [P.2409 ¶6] “hardware MNN circuit” illustrated Figs 4, 1-2 which  [P.2416 ¶2] “receives at each trial two vector inputs x and y” and/or [P.2410 ¶4] “training set… test set”}, 
wherein the one or more groups of MNN data include input data and one or more weight values {Soudry [P.2413 Sect.C] “weights are incremented according to the update rule” per Eq.28 and weights are based on inputs x and y per [P.2410 RtCol] Wnm(k)}, and 
wherein at least a portion of the input data and the weight values are presented as discrete values {Soudry [P.2410 Sect.B ¶1] “system operates on K discrete presentations of inputs (trials), indexed by k = 1, 2, K” again at [P.2411 Sect.A ¶2], [P.2416 Sect.A ¶1]}; 
calculating, by a master computation circuit of the computation circuit, an input gradient vector based on a first output gradient vector from an adjacent layer and based on a data type of each of the one or more groups of MNN data {Soudry Fig 4, [P.2408 Sect.1 ¶3] “backpropagation stems from the chain rule used to calculate the gradients” grad=∇ of SGD Eq.8 per [P.2410 RtCol] further noting “(·)T to be the transpose”. As direction of reverse pass moves back through adj layers Fig 4, data types are subject to activations, e.g., sigmoid σ or σ’ derivative thereof}; and 
parallelly calculating, by one or more slave computation circuits connected to the master computation circuit via an interconnection unit, portions of a second output vector based on the input gradient vector calculated by the master computation circuit and based on the data type of each of the one or more groups of MNN data {Soudry [P.2410 RtCol] “obtain the outer product” per Eqs. 9/10 is parallel calculation which generalizes at conclusion. Eq.10 is further detailed as local operation which .
However, Soudry does not disclose per amendment.
	El-Yaniv teaches: 
	wherein the calculating includes selectin, by the master computation circuit, one or more operations corresponding to each of the discrete values and performing the one or more selected operations to calculate the input gradient vector {El-Yaniv [Col3 Lines5-18] “bitwise operations are XNOR” operation is bitwise selection via XNOR. Discrete values are quantized values. The gradient partial derivative calculation which is backpropagated through layers is per Equation at [Col12 Line25]. See also xnor/xnot [Col13 Lines8-15] as well as left and right bit-shift with gradient clipping [Col11]}.
	El-Yaniv is directed to multi-layer neural network backpropagation with gradient calculation thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to further detail Soudry according the El-Yaniv (co-authored by same Soudry) being an extension of the original work for the motivation that “The power efficiency is improved by more than one order of magnitude” (El-Yaniv [Col14 Line12]). See Fig 4 where efficiency is specifically resolved in terms of pico-joules and which saves MAC operations (El-Yaniv [Col6 Lines1-2]). 

Claims 14-15 are rejected for the same rationale as claims 2-3, respectively.

Claims 16 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Soudry and El-Yaniv in view of Lin.
Claim 16 is rejected for the same rationale as claim 4.
Claim 23 is rejected for the same rationale as claim 12.

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Soudry and El-Yaniv in view of Andrychowicz.
Claim 17 is rejected for the same rationale as claim 5.

Claims 18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Soudry and El-Yaniv in view of Kim.
Claims 18 is rejected for the same rationale as claim 6.
Claims 20 is rejected for the same rationale as claim 9. 

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Soudry and El-Yaniv in view of Lashgar.
Claims 19 is rejected for the same rationale as claim 7.

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Soudry and El-Yaniv in view of Kim and Lashgar.
Claims 21 is rejected for the same rationale as claim 10. 

Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Soudry, El-Yaniv and Kim in view of Soudry14 and Lee.
Claim 22 is rejected for the same rationale as claim 11. 


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Chase P Hinckley whose telephone number is (571)272-7935. The examiner can normally be reached M-F 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda M. Huang can be reached on 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CHASE P. HINCKLEY/Examiner, Art Unit 2124                                                                                                                                                                                                        
/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126