DETAILED ACTION
Status of Claims
This action is in response to the applicant’s amendment filed on 2/9/2022 for application filed on 10/29/2018. Claim 1 – 18 are pending and have been examined.

Claim rejection under 35 U.S.C 112(a), 112(b) and 112(f) of the prior action has been withdrawn in light of the amendment and applicant’s remarks. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant's claim for foreign priority based on an application filed in China on 4/29/2016. 

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10/29/2018, 10/31/2019 and 4/15/2021 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Applicant’s Argument
Applicant's argument filed on 2/9/2022 has been fully considered but they are not persuasive. 
Regarding Claim rejection under 35 U.S.C. 101, applicant state that Claim 1 recites operation relating to backpropagation in a fully connected layer, multiplying is merely one of the operation and does not direct the claim to abstract idea. Examiner respectfully disagree. Multiplication recited in the claim is a mathematical calculation step. Use the step in a neural network backpropagation is generally linking the use of judicial exception to a particular technological environment or field of use (MPEP 2106.05(h)) 
applicant further state that Claim 1 provide improvement to the function of a neural network and thus integrate into practical application. Examiner respectfully disagree. Multiplication calculation already exists in typical neural network. The use of IC chip, controller circuit, master/slave computation circuit are recited in high generality and amounts to no more than a recitation of the word “apply it” (or equivalent), or no more than mere instruction to implement an abstract idea or other exception on a computer (MPEP 2106.05(f))
Regarding Claim rejection under 35 U.S.C. 103, applicant’s remark with respect to claim(s) have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. In response to the applicant’s argument that Hamalainen does not disclose multiplication of first data gradient and input data, examiner respectfully disagree. One cannot show nonobviousness by attacking reference individually where the rejection are based on combinations of references. Hamalainen discloses an architecture to perform multiplication in a neural network environment while Touretzky discloses that during the operation of a neural network, data gradient and input data are multiplied. However, Hamalainen does not disclose the amended element of IC chip and thus the newly cited reference is used and the action is final. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 1 – 9 and 16 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.  

Regarding Claim 1 – 9, Claim 1 – 4 and 7 recite: “the master computation circuit”. There is insufficient antecedent basis for this limitation in the claim. Applicant express the intention of amend the claim to “master computation circuit” to avoid claim interpretation under 112(f) however the referenced term “a master computation module” does not get the update. For the examination purpose, examiner interpret the referenced term as “a master computation circuit”. All of the depending claim Claim 2 – 9 are rejected with the same reason. 

Regarding Claim 16, Claim 16 recites: “the master computation module”. There is insufficient antecedent basis for this limitation in the claim. For the examination purpose, examiner interpret the referenced term as “the master computation circuit”. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.



Claims 1 – 18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.

As of Claim 1, in the Subject Matter Eligibility Test Step 1, the claimed apparatus in Claim 1 as a whole falls within one or more statutory category. 
In the Subject Matter Eligibility Test Step 2A Prong One, the claimed apparatus in Claim 1 recites the abstract ideas in the following limitations:
multiply one of the one or more first data gradients with the input data to generate a default weight gradient vector.
The steps of multiply recited a mathematical calculation and thus falls under the mathematical concepts group of abstract idea. The mere nominal recitation of first data gradients, input data and default weight gradient vector does not take the claim limitation out of the mathematical calculation group and thus the claim falls within judicial exception of mental processes of abstract idea and requires further analysis under Step 2A Prong Two.
In the Subject Matter Eligibility Test Step 2A Prong Two, Claim 1, recite the following additional elements along with the abstract ideas:
integrated circuit (IC) chip
a controller circuit configured to receive an instruction;
and one or more computation circuits that include a master computation circuit and one or more slave computation circuits, 
receive input data and one or more first data gradients in response to the instruction and transmit the input data and the one or more first data gradients to one or more slave computation circuits
The recited additional element of integrated circuit (IC) chip  and control circuit configured to receive instruction is highly generic, no more than an idea of a solution and mere instructions to apply an exception. The element of computation circuit that include a master and one or more slave is referring to the master/slave architecture, is highly generic and generally linking the use of the judicial exception to a particular technological environment or field of use. And, step for master circuit to receive data and transmit data to slave circuit add insignificant extra solution activities to the judicial exception. Thus, the additional element in Claim 1 does not integrate the abstract idea into a practical application and the claim as a whole is directed to the judicial exception that requires further analysis under Step 2B.
In the Subject Matter Eligibility Test Step 2B, the recited additional element of integrated circuit (IC) chip  and control circuit configured to receive instruction is highly generic, no more than an idea of a solution and mere instructions to apply an exception. The element of computation circuit that include a master and one or more slave is referring to the master/slave architecture, is highly generic and generally linking the use of the judicial exception to a particular technological environment or field of use. The data transition between master/slave circuit does not add meaningful limitation beyond appending well understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception (See at least Hamalainen, TUTNC: a general purpose parallel computer for neural network computations, Microprocessors and Microsystems Vol 19, 1995, fig. 1, fig. 2, & page. 451, col 2, para. 3, where during the execution, the host [master computation circuit] serves data transfers). Thus, the additional element in Claim 1 does not contribute an inventive concept and Claim 1 is not eligible subject matter under 35 U.S.C. 101.

	As of Claim 2, depending on Claim 1. In the Subject Matter Eligibility Test Step 2A Prong One, Claim 2 recites additional abstract idea in the following limit:
update one or more weight values based on the default weight gradient vector
The steps of update weight value based on gradient recited either a mathematical relationship or a mental process.
The Subject Matter Eligibility Test Step 2A Prong Two and 2B are same as set forth in claim 1. This claim does not recite any additional elements that integrates the abstract idea into practical application or amount to significantly more. The claim is not eligible.

As of Claim 3, depending on Claim 1. In the Subject Matter Eligibility Test Step 2A Prong One, Claim 3 recites additional abstract idea in the following limit:
calculate a scaled weight gradient vector based on the default weight gradient vector and a predetermined threshold value; 
update one or more weight values based on the scaled weight gradient vector.
The steps of calculate recite a mathematical calculation and the step of “update weight based on scaled gradient” recited either a mathematical relationship or a mental process.
The Subject Matter Eligibility Test Step 2A Prong Two and 2B are same as claim  1. This claim does not recite any additional elements that integrates the abstract idea into practical application or amount to significantly more. The claim is not eligible.

As of Claim 4, depending on Claim 1. In the Subject Matter Eligibility Test Step 2A Prong One, Claim 4 recites additional abstract idea in the following limit:
apply a derivative of an activation function to the one or more first data gradients o generate one or more input gradients
The steps of “apply derivative of activation function to gradient to generate gradient” recited mathematical calculation and mathematical relationship.
The Subject Matter Eligibility Test Step 2A Prong Two and 2B are same as  claim  1. This claim does not recite any additional elements that integrates the abstract idea into practical application or amount to significantly more. The claim is not eligible.

As of Claim 5, depending on Claim 4. In the Subject Matter Eligibility Test Step 2A Prong One, Claim 5 recites additional abstract idea in the following limit:
multiply one of the one or more input gradients with one or more weight vectors in a weight matrix to generate one or more multiplication results
The steps of “multiply input with weight to generate result” recited a mathematical calculation.
The Subject Matter Eligibility Test Step 2A Prong Two and 2B are same as claim 4. This claim does not recite any additional elements that integrates the abstract idea into practical application or amount to significantly more. The claim is not eligible.

As of Claim 6, depending on Claim 5. In the Subject Matter Eligibility Test Step 2A Prong One, Claim 6 recites additional abstract idea in the following limit:
combine the one or more multiplication results calculated respectively by the one or more slave computation circuits into an output gradient vector
The steps of “combine result into a vector” recited either a mathematical calculation or a mental step. The mere nominal recitation of calculated respectively by the one or more slave computation circuit does not take the claim limitation out of the abstract idea.
In the Subject Matter Eligibility Test Step 2A Prong Two and 2B, Claim 6 further recites the following additional elements:
interconnection circuit
The additional element of interconnection circuit do not add meaningful limitation beyond linking the use of judicial exception to a particular technological environment or field of use at high level of generality and thus neither integrate the abstract idea into a practical application in Step 1A Prong Two test nor contribute inventive concept in Step 2B test. Claim 6 is rejected under the same rationale as Claim 5.

As of Claim 7, depending on Claim 1, Claim 7 recites additional elements in the following limit:
an interconnection circuit configured to channel data between the master computation circuit and the one or more slave computation circuits.
In the Subject Matter Eligibility Test Step 2A, the recited additional element do not add meaningful limitation beyond adding insignificant extra solution activity to the judicial exception. Thus, the additional element in Claim 7 does not integrate the abstract idea into a practical application and the claim as a whole is directed to the judicial exception that requires further analysis under Step 2B.
In the Subject Matter Eligibility Test Step 2B, the recited additional element does not add meaningful limitation beyond appending well understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception (MPEP 2106.05(d)(II) The courts have recognized the following computer function as well-understood routine, and conventional function when they are claimed in a merely generic manner … i. Receiving or transmitting data over a network). Thus, the additional element in Claim 7 does not contribute an inventive concept and Claim 7 is not eligible subject matter under 35 U.S.C. 101, as pointed out in Step 2A, Prong 2 analysis.

	As of Claim 8 and 9, depending on Claim 1 and 5, Claim 8 and 9 recites additional elements in the following limit:
each of the one or more slave computation circuits includes a slave neuron caching circuit configured to store the one or more first data gradients with the input data
each of the one or more slave computation circuits includes a weight value caching circuit configured to store the weight matrix that includes the one or more weight vectors
In the Subject Matter Eligibility Test Step 2A, the recited additional elements of caching circuit storing data is highly generic and add insignificant extra solution activity to the judicial exception. Thus, the additional element in Claim 8 and 9 do not integrate the abstract idea into a practical application and the claim as a whole is directed to the judicial exception that requires further analysis under Step 2B.
In the Subject Matter Eligibility Test Step 2B, the recited additional element does not add meaningful limitation beyond appending well understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception (MPEP 2106.05(d)(II), The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner … iv. Storing and retrieving information in memory). Thus, the additional element in Claim 8 and 9 do not contribute an inventive concept and Claim 8 and 9 are not eligible subject matter under 35 U.S.C. 101, as pointed out in Step 2A, Prong 2 analysis.

As of Claim 10, in the Subject Matter Eligibility Test Step 1, the claimed apparatus in Claim 10 as a whole falls within one or more statutory category. 
In the Subject Matter Eligibility Test Step 2A Prong One, the claimed method in Claim 10 recites the abstract ideas in the following limitations:
multiplying … one of the one or more first data gradients with the input data to generate a default weight gradient vector.
The steps of multiplying recited a mathematical calculation and thus falls under the mathematical concepts group of abstract idea. The mere nominal recitation of first data gradients, input data and default weight gradient vector does not take the claim limitation out of the mathematical calculation group and thus the claim falls within judicial exception of mental processes of abstract idea and requires further analysis under Step 2A Prong Two.
In the Subject Matter Eligibility Test Step 2A Prong Two, Claim 10, recite the following additional elements along with the abstract ideas:
receiving, by a controller circuit, an instruction;
receive, by a master computation circuit, input data and one or more first data gradients in response to the instruction;
 transmitting, by the master computation circuit, the input data and the one or more first data gradients to one or more slave computation circuits
The recited additional element of receiving instruction by controller circuit, receiving data by master computation circuit and transmitting data by master computation circuit to slave computation circuit add insignificant extra solution activities to the judicial exception. Thus, the additional element in Claim 10 does not integrate the abstract idea into a practical application and the claim as a whole is directed to the judicial exception that requires further analysis under Step 2B.
In the Subject Matter Eligibility Test Step 2B, the recited additional elements of receiving data/instruction (BRI instruction is one type of data) and transmitting data does not add meaningful limitation beyond appending well understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception (MPEP 2106.05(d)(II) The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity … i. Receiving or transmitting data). Thus, the additional element in Claim 10 does not contribute an inventive concept and Claim 10 is not eligible subject matter under 35 U.S.C. 101.

	As of Claim 11, depending on Claim 10. In the Subject Matter Eligibility Test Step 2A Prong One, Claim 11 recites additional abstract idea in the following limit:
Update, by the master computation circuit, one or more weight values based on the default weight gradient vector
The steps of update weight value based on gradient recited either a mathematical relationship or a mental process. The mere nominal recitation of master computation circuit does not take the claim limitation out of the mathematical concept or mental process group. 
The Subject Matter Eligibility Test Step 2A Prong Two and 2B are same as claim  10. This claim does not recite any additional elements that integrates the abstract idea into practical application or amount to significantly more. The claim is not eligible.

As of Claim 12, depending on Claim 10. In the Subject Matter Eligibility Test Step 2A Prong One, Claim 12 recites additional abstract idea in the following limit:
Calculating, by the master computation circuit, a scaled weight gradient vector based on the default weight gradient vector and a predetermined threshold value; 
Updating, by the master computation circuit, one or more weight values based on the scaled weight gradient vector.
The steps of calculate recite a mathematical calculation and the step of “update weight based on scaled gradient” recited either a mathematical relationship or a mental process. The mere nominal recitation of master computation circuit does not take the claim limitation out of the mathematical concept or mental process group.
The Subject Matter Eligibility Test Step 2A Prong Two and 2B are same as claim 10. This claim does not recite any additional elements that integrates the abstract idea into practical application or amount to significantly more. The claim is not eligible.

As of Claim 13, depending on Claim 10. In the Subject Matter Eligibility Test Step 2A Prong One, Claim 13 recites additional abstract idea in the following limit:
applying, by the master computation circuit, a derivative of an activation function to the one or more first data gradients o generate one or more input gradients
The steps of “applying derivative of activation function to gradient to generate gradient” recited mathematical calculation and mathematical relationship. The mere nominal recitation of master computation circuit does not take the claim limitation out of the mathematical concept group.
The Subject Matter Eligibility Test Step 2A Prong Two and 2B are same as claim 10. This claim does not recite any additional elements that integrates the abstract idea into practical application or amount to significantly more. The claim is not eligible.

As of Claim 14, depending on Claim 13. In the Subject Matter Eligibility Test Step 2A Prong One, Claim 14 recites additional abstract idea in the following limit:
multiplying, by the one or more slave computation circuit, one of the one or more input gradients with one or more weight vectors in a weight matrix to generate one or more multiplication results
The steps of “multiplying input with weight to generate result” recited a mathematical calculation. The mere nominal recitation of slave computation circuit does not take the claim limitation out of the mathematical concept group.
The Subject Matter Eligibility Test Step 2A Prong Two and 2B are same as  claim  13. This claim does not recite any additional elements that integrates the abstract idea into practical application or amount to significantly more. The claim is not eligible.

As of Claim 15, depending on Claim 14. In the Subject Matter Eligibility Test Step 2A Prong One, Claim 15 recites additional abstract idea in the following limit:
combining, by the interconnection circuit, the one or more multiplication results, calculated respectively by the one or more slave computation circuit, into an output gradient vector
The steps of “combining result into a vector” recited either a mathematical calculation or a mental step. The mere nominal recitation of interconnection circuit and slave computation circuit does not take the claim limitation out of the mathematical concept group.
The Subject Matter Eligibility Test Step 2A Prong Two and 2B are same as  claim  14. This claim does not recite any additional elements that integrates the abstract idea into practical application or amount to significantly more. The claim is not eligible.

As of Claim 16, depending on Claim 10, Claim 16 recites additional elements in the following limit:
channeling, by an interconnection circuit, data between the master computation circuit and the one or more slave computation circuits.
In the Subject Matter Eligibility Test Step 2A, the recited additional element do not add meaningful limitation beyond adding insignificant extra solution activity to the judicial exception. Thus, the additional element in Claim 16 does not integrate the abstract idea into a practical application and the claim as a whole is directed to the judicial exception that requires further analysis under Step 2B.
In the Subject Matter Eligibility Test Step 2B, the recited additional element does not add meaningful limitation beyond appending well understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception (MPEP 2106.05(d)(II) The courts have recognized the following computer function as well-understood routine, and conventional function when they are claimed in a merely generic manner … i. Receiving or transmitting data over a network). Thus, the additional element in Claim 16 does not contribute an inventive concept and Claim 16 is not eligible subject matter under 35 U.S.C. 101.

	As of Claim 17 and Claim 18, depending on Claim 10 and Claim 14, Claim17 and Claim 18 recites additional elements in the following limit:
storing, by a slave neuron caching circuit of each of the one or more slave computation circuits, the one or more first data gradients with the input data
storing, by a weight value caching circuit of each of the one or more slave computation circuits, the weight matrix that includes the one or more weight vectors
In the Subject Matter Eligibility Test Step 2A, the recited additional elements of storing data is highly generic and add insignificant extra solution activity to the judicial exception. Thus, the additional element in Claim 17 and Claim 18 do not integrate the abstract idea into a practical application and the claim as a whole is directed to the judicial exception that requires further analysis under Step 2B.
In the Subject Matter Eligibility Test Step 2B, the recited additional element does not add meaningful limitation beyond appending well understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception (MPEP 2106.05(d)(II), The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner … iv. Storing and retrieving information in memory). Thus, the additional element in Claim 17 and Claim 18 do not contribute an inventive concept and Claim 17 and Claim 18 are not eligible subject matter under 35 U.S.C. 101.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 1 – 5, 10 - 14 are rejected under 35 U.S.C. 103 as being unpatentable over Touretzky, Backpropagation Learning, Lecture 15-486/782: Artificial Neural Networks, Computer Science Carnegie Mellon University, 2006 in view of Lee Performance Analysis of Bit-Width Reduced FPU in FPGAs, Journal of Embedded Systems Vol 2009 further in view of Li Arithmetic formats for implementing artificial neural network, Can. J. Elect. Comput.Eng., Vol. 31, No. 1, Winter, 2006.

Regarding Claim 1, Touretzky discloses: 
backpropagation in a fully connected layer of a neural network (Touretzky, page. 12, where backpropagation of error in training a full connected neural network), comprising:
receive input data and one or more first data gradients (Touretzky, page. 9, where during training, the neural network layers generate/receive data xi [input data] and the gradient of data y                          
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    y
                                
                            
                        
                     [first data gradient])
multiply one of the one or more first data gradients with the input data to generate a default weight gradient vector (Touretzky, page. 9, where gradient of weight [default weight gradient]                         
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    w
                                    i
                                
                            
                            =
                             
                            
                                
                                    d
                                    E
                                
                                
                                    d
                                    y
                                
                            
                            *
                             
                            
                                
                                    ∂
                                    y
                                
                                
                                    ∂
                                    w
                                    i
                                
                            
                            =
                            
                                
                                    y
                                    -
                                    d
                                
                            
                            *
                            x
                            i
                        
                    . (y-d) =                         
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    y
                                
                            
                        
                     is the gradient of the data y [first data gradient] and xi is the input data ).
Touretzky does not explicitly disclose:
An integrated circuit IC chip for … neural network
a controller circuit configured to receive an instruction;
and one or more computation circuit that include a master computation circuit and one or more slave computation circuit, 
wherein the master computation circuit configured to receive input data and one or more first data gradients in response to the instruction and transmit the input data and the one or more first data gradients to one or more slave computation circuit
wherein the one or more slave computation circuit are respectively configured to multiply one of the one or more first data gradients with the input data to generate a default weight gradient vector
Lee explicitly discloses:
An integrated circuit IC chip for … neural network (Lee, intro, para. 1, ln. 11 – 12, where implement it on field programmable gate arrays (FPGAs) [integrated circuit IC chip])
a controller circuit configured to receive an instruction; and one or more computation circuits that include a master computation circuit and one or more slave computation circuit (Lee, fig. 3 & sec. 2.4, para. 1, where consists of control logic [controller circuit; master computation circuit] and an FPU [slave computation circuit]; control logic performs neural network computation by programmed logic [instruction]);, 
wherein the master computation circuit configured to receive input data and one or more first data gradients in response to the instruction and transmit the input data and the one or more first data gradients to one or more slave computation circuit (Lee, fig. 3, where in the FPGA neural network, the neural network top module [master computation circuit] control and sending data to the FPU [slave computation circuit] based on programmed training logic [instruction])
Touretzky and Lee both discloses method of implementing neural network and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Touretzky’s teaching of the training neural network using gradient descending back propagation with Lee’s disclosure of implementing neural network in an IC chip with MAC units to perform multiplication and accumulation to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification to speed up the training time (Lee, intro, para. 1, ln. 11 – 12).
Touretzky in view of Lee do not explicitly disclose: 
wherein the one or more slave computation circuits are respectively configured to multiply one of the one or more first data gradients with the input data to generate a default weight gradient vector 
Li explicitly discloses:
wherein the one or more slave computation circuits are respectively configured to multiply one of the one or more first data gradients with the input data to generate a default weight gradient vector (Li sec. III, ln. 3 – 4, where the basic arithmetic operations in the forward and backward passes consist of a multiplication and a summation; sec. IIA. 2, eq. 4 – 6, where multiplication including gradient δ and input data of the layer o)
Touretzky (in view of Lee) and Li both discloses method of implementing neural network and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Touretzky (in view of Lee)’s teaching of the neural network implementation using IC with MAC unit with Li’s disclosure of neural network IC with MAC units that performs forward and backward multiplication and accumulation to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification for area-efficient implementation (Li, intro, para. 1, ln. 4 – 6).

Regarding Claim 2, depending on Claim 1. Touretzky in view of Lee and Li discloses the IC chip of Claim 1. Touretzky in view of Lee and Li further discloses:  
further configured to update one or more weight values based on the default weight gradient vector (Touretzky, page 9 & page. 13, where during the training phase, weights are updated by                         
                            ∆
                            w
                            i
                        
                     which is based on the weight gradients                         
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    w
                                    i
                                
                            
                        
                     [default weight gradient vector]).
wherein the master computation circuit is further configured to update … (Lee, fig. 3, where the Neural network top module [master computation circuit] control the neural network operation which including updating parameters during learning)
The rationale to combine Touretzky’s teaching and Lee and Li’s teaching is the same as set forth in claim 1. 

Regarding Claim 3, depending on Claim 1. Touretzky in view of Lee and Li discloses the IC chip of Claim 1. Touretzky in view of Lee and Li further discloses:  
further configured to calculate a scaled weight gradient vector based on the default weight gradient vector and a predetermined threshold value; and update one or more weight values based on the scaled weight gradient vector (Touretzky, page. 9, where during training, weights are updated by                          
                            ∆
                            w
                            i
                            =
                            -
                            η
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    w
                                    i
                                
                            
                        
                    , the default weights gradient scaled by learning rate η; page. 26, where the learning rate is depending on cosine [threshold]).
wherein the master computation circuit is further configured to calculate … (Lee, fig. 3, where the Neural network top module [master computation circuit] control the neural network operation which including calculating gradient vector)
The rationale to combine Touretzky’s teaching and Lee and Li’s teaching is the same as set forth in claim 1. 

Regarding Claim 4, depending on Claim 1. Touretzky in view of Lee and Li discloses the IC chip of Claim 1. Touretzky in view of Lee and Li further discloses:  
further configured to apply a derivative of an activation function to the one or more first data gradients to generate one or more input gradients (Touretzky, page. 12, where during the back propagation, the gradient of the middle layer back propagation input                         
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    
                                        
                                            n
                                            e
                                            t
                                        
                                        
                                            k
                                        
                                    
                                
                            
                        
                     [input gradients] is calculated by the derivative of the activation function g’(netk) and the first data gradients (yk-dk), i.e.,                          
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    
                                        
                                            n
                                            e
                                            t
                                        
                                        
                                            k
                                        
                                    
                                
                            
                            =
                            
                                
                                    
                                        
                                            y
                                        
                                        
                                            k
                                        
                                    
                                    -
                                    
                                        
                                            d
                                        
                                        
                                            k
                                        
                                    
                                
                            
                            *
                            g
                            '
                            (
                            
                                
                                    n
                                    e
                                    t
                                
                                
                                    k
                                
                            
                            )
                        
                    ).
wherein the master computation circuit is further configured to apply … (Lee, fig. 3, where the Neural network top module [master computation circuit] control the neural network operation which including calculating gradient vector;)
The rationale to combine Touretzky’s teaching and Lee and Li’s teaching is the same as set forth in claim 1. 

Regarding Claim 5, depending on Claim 4. Touretzky in view of Lee and Li discloses the IC chip of Claim 4. Touretzky in view of Lee and Li further discloses:  
multiply one of the one or more input gradients with one or more weight vectors in a weight matrix to generate one or more multiplication results (Touretzky, page. 12, where during back propagation, the gradient of the middle layer back propagation output yj is calculated as                         
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    y
                                    j
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        k
                                    
                                
                                
                                    (
                                    
                                        
                                            ∂
                                            E
                                        
                                        
                                            ∂
                                            
                                                
                                                    n
                                                    e
                                                    t
                                                
                                                
                                                    k
                                                
                                            
                                        
                                    
                                    *
                                    
                                        
                                            ∂
                                            
                                                
                                                    n
                                                    e
                                                    t
                                                
                                                
                                                    k
                                                
                                            
                                        
                                        
                                            ∂
                                            
                                                
                                                    y
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    )
                                
                            
                        
                    ; page. 10,                         
                            
                                
                                    n
                                    e
                                    t
                                
                                
                                    k
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        i
                                    
                                
                                
                                    
                                        
                                            w
                                        
                                        
                                            i
                                            k
                                        
                                    
                                    *
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    ; thus                         
                            
                                
                                    ∂
                                    
                                        
                                            n
                                            e
                                            t
                                        
                                        
                                            k
                                        
                                    
                                
                                
                                    ∂
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            =
                            
                                
                                    w
                                
                                
                                    i
                                    k
                                
                            
                        
                     ,                          
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    y
                                    j
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        k
                                    
                                
                                
                                    (
                                    
                                        
                                            ∂
                                            E
                                        
                                        
                                            ∂
                                            
                                                
                                                    n
                                                    e
                                                    t
                                                
                                                
                                                    k
                                                
                                            
                                        
                                    
                                    *
                                    
                                        
                                            w
                                        
                                        
                                            i
                                            k
                                        
                                    
                                    )
                                
                            
                        
                    ; i.e., multiply, for each of the k node of the output layer, input gradient                          
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    
                                        
                                            n
                                            e
                                            t
                                        
                                        
                                            k
                                        
                                    
                                
                            
                        
                     with weight wik)
wherein the one or more slave computation circuit are respectively configured to multiply one of the one or more input gradients with one or more weight vectors in a weight matrix to generate one or more multiplication results (Li, abs. ln. 2 – 3, where ANN processing element include multiplication and addition operation therefor … multiplier/adder [slave computation circuit] were implemented; sec. 2.2, eq. 4 & 6 where back propagation during training involve multiplication and addition of weight and gradient matrix; i.e., use multiplier/adder during training to reduce training time)
The rationale to combine Touretzky’s teaching and Lee and Li’s teaching is the same as set forth in claim 4. 

Regarding Claim 10, Touretzky discloses: 
A method for backpropagation in a fully connected layer of a neural network (Touretzky, page. 12, where backpropagation of error in training a full connected neural network), comprising:
receiving … input data and one or more first data gradients (Touretzky, page. 9, where during training, the neural network layers generate/receive data xi [input data] and the gradient of data y                          
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    y
                                
                            
                        
                     [first data gradient]) 
respectively multiplying … one of the one or more first data gradients with the input data to generate a default weight gradient vector (Touretzky, page. 9, where gradient of weight [default weight gradient]                         
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    w
                                    i
                                
                            
                            =
                             
                            
                                
                                    d
                                    E
                                
                                
                                    d
                                    y
                                
                            
                            *
                             
                            
                                
                                    ∂
                                    y
                                
                                
                                    ∂
                                    w
                                    i
                                
                            
                            =
                            
                                
                                    y
                                    -
                                    d
                                
                            
                            *
                            x
                            i
                        
                    . (y-d) =                         
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    y
                                
                            
                        
                     is the gradient of the data y [first data gradient] and xi is the input data ).
Touretzky does not explicitly disclose:
receiving, by a controller circuit, an instruction;
receiving, by a master computation circuit, input data and one or more first data gradients in response to the instruction
transmitting, by the master computation circuit, the input data and the one or more first data gradients to one or more slave computation circuit
respectively multiplying, by the one or more slave computation circuit, one of the one or more first data gradients with the input data to generate a default weight gradient vector
Lee explicitly discloses:
receiving, by a controller circuit, an instruction (Lee, fig. 3 & sec. 2.4, para. 1, where consists of control logic [controller circuit]; control logic performs neural network computation by programmed logic [instruction]);
receiving, by a master computation circuit, input data and one or more first data gradients in response to the instruction; transmitting, by the master computation circuit, the input data and the one or more first data gradients to one or more slave computation circuits (Lee, fig. 3, where in the FPGA neural network, the neural network top module [master computation circuit] control and sending data to the FPU [slave computation circuit] based on programmed logic [instruction])
Touretzky and Lee both discloses method of implementing neural network and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Touretzky’s teaching of the training neural network using gradient descending back propagation with Lee’s disclosure of implementing neural network in an IC chip with MAC units to perform multiplication and accumulation to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification to speed up the training time (Lee, intro, para. 1, ln. 11 – 12).
Touretzky in view of Lee do not explicitly disclose: 
multiply, by the one or more slave computation circuits, one of the one or more first data gradients with the input data to generate a default weight gradient vector
Li explicitly discloses:
multiply, by the one or more slave computation circuits, one of the one or more first data gradients with the input data to generate a default weight gradient vector (Li sec. III, ln. 3 – 4, where the basic arithmetic operations in the forward and backward passes consist of a multiplication and a summation; sec. IIA. 2, eq. 4 – 6, where multiplication including gradient δ and input data of the layer o)
Touretzky (in view of Lee) and Li both discloses method of implementing neural network and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Touretzky (in view of Lee)’s teaching of the neural network implementation using IC with MAC unit with Li’s disclosure of neural network IC with MAC units that performs forward and backward multiplication and accumulation to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification for area-efficient implementation (Li, intro, para. 1, ln. 4 – 6).

Regarding Claim 11, depending on Claim 10. Touretzky in view of Lee and Li discloses the method of Claim 10. Touretzky in view of Lee and Li further discloses:  
Updating … one or more weight values based on the default weight gradient vector (Touretzky, page 9 & page. 13, where during the training phase, weights are updated by                         
                            ∆
                            w
                            i
                        
                     which is based on the weight gradients                         
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    w
                                    i
                                
                            
                        
                     [default weight gradient vector]).
Updating, by the master computation circuit, (Lee, fig. 3, where the Neural network top module [master computation circuit] control the neural network operation which including updating parameters during learning)
The rationale to combine Touretzky’s teaching and Lee and Li’s teaching is the same as set forth in claim 10. 

Regarding Claim 12, depending on Claim 10. Touretzky in view of Lee and Li discloses the method of Claim 10. Touretzky in view of Lee and Li further discloses:  
Calculating … a scaled weight gradient vector based on the default weight gradient vector and a predetermined threshold value; and update one or more weight values based on the scaled weight gradient vector (Touretzky, page. 9, where during training, weights are updated by                          
                            ∆
                            w
                            i
                            =
                            -
                            η
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    w
                                    i
                                
                            
                        
                    , the default weights gradient scaled by learning rate η; page. 26, where the learning rate is depending on cosine [threshold]).
Calculating, by the master computation circuit, (Lee, fig. 3, where the Neural network top module [master computation circuit] control the neural network operation which including calculating gradient vector)
The rationale to combine Touretzky’s teaching and Lee and Li’s teaching is the same as set forth in claim 10. 

Regarding Claim 13, depending on Claim 10. Touretzky in view of Lee and Li discloses the method of Claim 10. Touretzky in view of Lee and Li further discloses:  
applying … a derivative of an activation function to the one or more first data gradients o generate one or more input gradients (Touretzky, page. 12, where during the back propagation, the gradient of the middle layer back propagation input                         
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    
                                        
                                            n
                                            e
                                            t
                                        
                                        
                                            k
                                        
                                    
                                
                            
                        
                     [input gradients] is calculated by the derivative of the activation function g’(netk) and the first data gradients (yk-dk), i.e.,                          
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    
                                        
                                            n
                                            e
                                            t
                                        
                                        
                                            k
                                        
                                    
                                
                            
                            =
                            
                                
                                    
                                        
                                            y
                                        
                                        
                                            k
                                        
                                    
                                    -
                                    
                                        
                                            d
                                        
                                        
                                            k
                                        
                                    
                                
                            
                            *
                            g
                            '
                            (
                            
                                
                                    n
                                    e
                                    t
                                
                                
                                    k
                                
                            
                            )
                        
                    ).
applying, by the master computation circuit, (Lee, fig. 3, where the Neural network top module [master computation circuit] control the neural network operation which including calculating gradient vector)
The rationale to combine Touretzky’s teaching and Lee and Li’s teaching is the same as set forth in claim 10. 

Regarding Claim 14, depending on Claim 13. Touretzky in view of Lee and Li discloses the method of Claim 13. Touretzky in view of Lee and Li further discloses:  
multiplying … one of the one or more input gradients with one or more weight vectors in a weight matrix to generate one or more multiplication results (Touretzky, page. 12, where during back propagation, the gradient of the middle layer back propagation output yj is calculated as                         
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    y
                                    j
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        k
                                    
                                
                                
                                    (
                                    
                                        
                                            ∂
                                            E
                                        
                                        
                                            ∂
                                            
                                                
                                                    n
                                                    e
                                                    t
                                                
                                                
                                                    k
                                                
                                            
                                        
                                    
                                    *
                                    
                                        
                                            ∂
                                            
                                                
                                                    n
                                                    e
                                                    t
                                                
                                                
                                                    k
                                                
                                            
                                        
                                        
                                            ∂
                                            
                                                
                                                    y
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    )
                                
                            
                        
                    ; page. 10,                         
                            
                                
                                    n
                                    e
                                    t
                                
                                
                                    k
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        i
                                    
                                
                                
                                    
                                        
                                            w
                                        
                                        
                                            i
                                            k
                                        
                                    
                                    *
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    ; thus                         
                            
                                
                                    ∂
                                    
                                        
                                            n
                                            e
                                            t
                                        
                                        
                                            k
                                        
                                    
                                
                                
                                    ∂
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            =
                            
                                
                                    w
                                
                                
                                    i
                                    k
                                
                            
                        
                     ,                          
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    y
                                    j
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        k
                                    
                                
                                
                                    (
                                    
                                        
                                            ∂
                                            E
                                        
                                        
                                            ∂
                                            
                                                
                                                    n
                                                    e
                                                    t
                                                
                                                
                                                    k
                                                
                                            
                                        
                                    
                                    *
                                    
                                        
                                            w
                                        
                                        
                                            i
                                            k
                                        
                                    
                                    )
                                
                            
                        
                    ; i.e., multiply, for each of the k node of the output layer, input gradient                          
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    
                                        
                                            n
                                            e
                                            t
                                        
                                        
                                            k
                                        
                                    
                                
                            
                        
                     with weight wik)
multiplying, by the one or more slave computation circuits, (Li, intro. ln. 2 – 3, where ANN processing element include multiplication and addition operation therefor multiplier/adder [slave computation circuit] were implemented; sec. 2.2, eq. 4 & 6 where back propagation during training involve multiplication and addition of weight and gradient matrix; i.e., use multiplier/adder during training to reduce training time)
The rationale to combine Touretzky’s teaching and Lee and Li’s teaching is the same as set forth in claim 13. 

Claim 6 – 9 and 15 - 18 are rejected under 35 U.S.C. 103 as being unpatentable over Touretzky, Backpropagation Learning, Lecture 15-486/782: Artificial Neural Networks, Computer Science Carnegie Mellon University, 2006 in view of Lee Performance Analysis of Bit-Width Reduced FPU in FPGAs, Journal of Embedded Systems Vol 2009 further in view of Li Arithmetic formats for implementing artificial neural network, Can. J. Elect. Comput.Eng., Vol. 31, No. 1, Winter, 2006 and Zhang Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks, FPGA’ 15 ACM.

Regarding Claim 6, depending on Claim 5. Touretzky in view of Lee and Li discloses the IC chip of Claim 5. Touretzky in view of Lee and Li further discloses:  
combine the one or more multiplication results … into an output gradient vector (Touretzky, page. 12, where during back propagation, the gradients of the middle layer that output and back propagate to the lower layer is                         
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    y
                                    j
                                
                            
                        
                     [output gradient vector], which is calculated as                         
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    y
                                    j
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        k
                                    
                                
                                
                                    (
                                    
                                        
                                            ∂
                                            E
                                        
                                        
                                            ∂
                                            
                                                
                                                    n
                                                    e
                                                    t
                                                
                                                
                                                    k
                                                
                                            
                                        
                                    
                                    *
                                    
                                        
                                            w
                                        
                                        
                                            i
                                            k
                                        
                                    
                                    )
                                
                            
                        
                    ; i.e., combine all the multiplication results of k nodes).
Touretzky in view of Lee and Li do not explicitly disclose:
further comprising an interconnection circuit configured to combine the one or more multiplication results calculated respectively by the one or more slave computation circuit 
Zhang explicitly disclose:
further comprising an interconnection circuit configured to combine the one or more multiplication results calculated respectively by the one or more slave computation circuit (Zhang, fig. 7, where the multiplication [slave computation circuit] are combined through connections [interconnection] and added together).
Touretzky (in view of Lee and Li) and Zhang both discloses hardware implementation of neural network and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Touretzky (in view of Lee and Li)’s teaching of the hardware implementation of neural network using multiplication and accumulation architecture with Zhang’s disclosure of the details of the multiplication and accumulation engine to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification as the combination yield predictable result.

Regarding Claim 7, depending on Claim 1. Touretzky in view of Lee and Li discloses the IC chip of Claim 1. Touretzky in view of Lee and Li do not explicitly disclose:
further comprising an interconnection circuit configured to channel data between the master computation circuit and the one or more slave computation circuit
Zhang explicitly discloses:  
further comprising an interconnection circuit configured to channel data between the master computation circuit and the one or more slave computation circuit (Zhang, fig. 7 & 11, where the connections [interconnection circuit] in fig. 7 combines multiplication data [by slave computation circuit] to send [channel] to the program controller [master computation circuit] ).
The rationale to combine Touretzky (in view of Lee and Li)’s teaching and Zhang’s teaching is the same as set forth in claim 6. 

Regarding Claim 8, depending on Claim 1. Touretzky in view of Lee and Li discloses the IC chip of Claim 1. Touretzky in view of Lee and Li do not explicitly discloses:  
wherein each of the one or more slave computation circuits includes a slave neuron caching circuit configured to store the one or more first data gradients with the input data 
Zhang explicitly disclose:
wherein each of the one or more slave computation circuits includes a slave neuron caching circuit configured to store the one or more first data gradients with the input data (Zhang fig. 11 where input buffer set 0 and input buffer set 1 [slave neuron caching circuit] store data [first data gradients; input data] that will be multiplied).
The rationale to combine Touretzky (in view of Lee and Li)’s teaching and Zhang’s teaching is the same as set forth in claim 6. 

Regarding Claim 9, depending on Claim 5. Touretzky in view of Lee and Li discloses the IC chip of Claim 5. Touretzky in view of Lee and Li do not explicitly discloses:  
wherein each of the one or more slave computation circuits includes a weight value caching circuit configured to store the weight matrix that includes the one or more weight vectors 
Zhang explicitly discloses
wherein each of the one or more slave computation circuits includes a weight value caching circuit configured to store the weight matrix that includes the one or more weight vectors (Zhang fig. 7 where the weights buffer [weight value caching circuit] store weight data [weight vector]).
The rationale to combine Touretzky (in view of Lee and Li)’s teaching and Zhang’s teaching is the same as set forth in claim 6. 

Regarding Claim 15, depending on Claim 14. Touretzky in view of Lee and Li discloses the method of Claim 14. Touretzky in view of Lee and Li further discloses:  
combining … the one or more multiplication results … into an output gradient vector (Touretzky, page. 12, where during back propagation, the gradients of the middle layer that output and back propagate to the lower layer is                         
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    y
                                    j
                                
                            
                        
                     [output gradient vector], which is calculated as                         
                            
                                
                                    ∂
                                    E
                                
                                
                                    ∂
                                    y
                                    j
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        k
                                    
                                
                                
                                    (
                                    
                                        
                                            ∂
                                            E
                                        
                                        
                                            ∂
                                            
                                                
                                                    n
                                                    e
                                                    t
                                                
                                                
                                                    k
                                                
                                            
                                        
                                    
                                    *
                                    
                                        
                                            w
                                        
                                        
                                            i
                                            k
                                        
                                    
                                    )
                                
                            
                        
                    ; i.e., combine all the multiplication results of k nodes).
Touretzky in view of Lee and Li do not explicitly disclose:
combining, by an interconnection circuit, the one or more multiplication results calculated respectively by the one or more slave computation circuit
Zhang explicitly disclose:
combining, by an interconnection circuit, the one or more multiplication results calculated respectively by the one or more slave computation circuit (Zhang, fig. 7, where the multiplication [slave computation circuit] are combined through connections [interconnection] and added together).
Touretzky (in view of Lee and Li) and Zhang both discloses method of hardware implementing neural network and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Touretzky (in view of Lee and Li)’s teaching of the hardware implementation of neural network using multiplication and accumulation architecture with Zhang’s disclosure of the details of the multiplication and accumulation engine to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification as the combination yield predictable result.

Regarding Claim 16, depending on Claim 10. Touretzky in view of Lee and Li discloses the apparatus of Claim 10. Touretzky in view of Lee and Li do not explicitly discloses:  
channeling, by an interconnection circuit, data between the master computation circuit and the one or more slave computation circuit
Zhang explicitly discloses:  
channeling, by an interconnection circuit, data between the master computation circuit and the one or more slave computation circuit (Zhang, fig. 7 & 11, where the connections [interconnection circuit] in fig. 7 combines multiplication data [by slave computation circuit] to send [channel] to the program controller [master computation circuit]).
The rationale to combine Touretzky (in view of Lee and Li)’s teaching and Zhang’s teaching is the same as set forth in claim 15. 

Regarding Claim 17, depending on Claim 10. Touretzky in view of Lee and Li discloses the apparatus of Claim 10. Touretzky in view of Lee and Li do not explicitly disclose:
Storing, by a slave neuron caching circuit of each of the one or more slave computation circuits, the one or more first data gradients with the input data 
Zhang explicitly discloses:  
Storing, by a slave neuron caching circuit of each of the one or more slave computation circuits, the one or more first data gradients with the input data (Zhang fig. 11 where input buffer set 0 and input buffer set 1 [slave neuron caching circuit] store data [first data gradients; input data] that will be multiplied).
The rationale to combine Touretzky (in view of Lee and Li)’s teaching and Zhang’s teaching is the same as set forth in claim 15. 


Regarding Claim 18, depending on Claim 14. Touretzky in view of Lee and Li discloses the apparatus of Claim 14. Touretzky in view of Lee and Li do not explicitly disclose:
Storing, by a weight value caching circuit of each of the one or more slave computation circuits, the weight matrix that includes the one or more weight vectors
Zhang explicitly discloses:  
Storing, by a weight value caching circuit of each of the one or more slave computation circuits, the weight matrix that includes the one or more weight vectors (Zhang fig. 7 where the weights buffer [weight value caching circuit] store weight data [weight vector]).
The rationale to combine Touretzky (in view of Lee and Li)’s teaching and Zhang’s teaching is the same as set forth in claim 15. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIEN MING CHOU whose telephone number is (571)272-9354.  The examiner can normally be reached on Monday- Friday 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, CHAKI KAKALI can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.








/S.C./Examiner, Art Unit 2122                                                                                                                                                                                                        
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122