DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 7/25/2022 has been entered.
 
Status of Claims
This action is in response to the applicant’s amendment filed on 7/25/2022 for application filed on 10/29/2018. Claim 1, 3 – 10 and 12 – 18 are pending and have been examined.

Claim 1, 6, 7, 10, 15 and 16 are amended. 

Claim 2 and 11 are canceled. 

Claim rejection under 35 U.S.C. 112(b) of the prior action has been withdrawn in light of the amendment and applicant’s remarks. 

Claim rejection under 35 U.S.C. 101 is withdrawn in light of the applicant’s remarks and amendment. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant's claim for foreign priority based on an application filed in China on 4/29/2016. 

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10/29/2018, 10/31/2019, 4/15/2021 and 7/14/2022 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Applicant’s Argument
Applicant's argument filed on 7/25/2022 has been fully considered but they are not persuasive. 
Regarding Claim rejection under 35 U.S.C. 103, applicant’s argument have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant state that the cited references do not disclose interconnection circuit to be independent hardware circuit between the master computation circuit and the slave computation circuits or slave computation circuits to be hardware circuits that are independent from the master computation circuit. However, the amended claim do not require hardware circuits to be independent nor does the specification disclose the meaning of the term “independent”. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 15 and 16 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.  

Regarding Claim 15 and 16, Claim 15 and 16 recite: “the interconnection circuit”, however the claim 10 recited more than one interconnection circuit.  It is not clear which interconnection circuit the limitation is referring to and one of ordinary skill in the art would not be reasonably apprise the scope of the invention. In light of specification and Claim 1, both interconnection circuit of Claim 10 are interpreted to point to the same interconnection circuit. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 1 – 5, 10 - 14 are rejected under 35 U.S.C. 103 as being unpatentable over of Li Arithmetic formats for implementing artificial neural network, Can. J. Elect. Comput.Eng., Vol. 31, No. 1, Winter, 2006, in view of Hamalainen, TUTNC: a general purpose parallel computer for neural network computations, Microprocessors and Microsystems Vol 19, 1995.

Regarding Claim 1, Li discloses: 
An integrated circuit (IC) chip for backpropagation in a fully connected layer of a neural network (Li, abs. ln, 1 – 2, where implementation of ANN on field programmable gated array (IC chip); sec. 2, para. 2 where MLP using the backpropagation algorithm has two computational steps: 1 Forward computation … 2 Backward computation; fig. 1, where the layer 1 is fully connected ), comprising:
	… circuit configured to receive input data (Li, eq. 6, where oj(s-1) is the output from s-1 layer and the input of layer s) and one or more first data gradients (Li, eq. 4, where ꜫks is the error terms and local gradients [first data gradients]) 
multiply one of the one or more first data gradients with the input data to generate a default weight gradient vector (Li, fig. 6, where δks oj(s-1) [default weight gradient vector] is generated by multiplying δks [first data gradients] and oj(s-1) [input data; the output of s-1 layer is the input of layer s])
Li does not explicitly disclose:
a controller circuit configured to receive an instruction;
and one or more computation circuit that include a master computation circuit and one or more slave computation circuit, 
wherein the master computation circuit is communicatively connected to the one or more slave computation circuits via an interconnection circuit,
wherein the master computation circuit configured to receive … data … in response to the instruction and transmit the input data and the one or more first data gradients to one or more slave computation circuit via the interconnection circuit and 
wherein the one or more slave computation circuit are respectively configured to multiply 
wherein the master computation circuit is further configured to update one or more weight values based on the default weight gradient vector
Hamalainen explicitly disclose: 
a controller circuit configured to receive an instruction (Hamalainen, page. 451, col. 1, para. 3, ln. 1 – 5 & fig. 6, where the root is an interface IFC [controller unit] to a host computer which receive instructions from the host to control the CU and PU);
and one or more computation circuit that include a master computation circuit and one or more slave computation circuit (Hamalainen, fig. 1 & page. 448, col. 2, para. 2, ln. 4 – 7, where this architecture can be referred to as a master slave configuration; fig. 6, where in TUTNC system, PUs are slave processing unit [slave computation circuit], IFC and host [master computation circuit] controls the overall system), 
wherein the master computation circuit is communicatively connected to the one or more slave computation circuits via an interconnection circuit (Hamalainen, where the communication network [interconnect circuit] between PU [slave computation circuit] and IFC [master computation circuit] are connected for data communication),
wherein the master computation circuit configured to receive … data … in response to the instruction and transmit the input data and the one or more first data gradients to one or more slave computation circuit via the interconnection circuit (Hamalainen, page. 451, col 2, para. 3, ln. 3 – 6, where host [master computation circuit] serves data transfers [receive input data and one or more first data gradients; transmit data to slave computation circuit; through interconnection circuit between Master and Slave]);  
wherein the one or more slave computation circuit are respectively configured to multiply (Hamalainen, page 449, col. 2, para. 2, where in the weight parallelism … the task of the PE is simply to multiply) 
wherein the master computation circuit is further configured to update one or more weight values based on the default weight gradient vector (Hamalainen, page. 451, col. 2, para. 3, ln. 3 – 6, where during execution, the host [master computation circuit] … controls the overall execution by sending real-time command to the system; page 449, col. 2, para. 2, where the root [master computation circuit] performs thresholding and stores the neuron output to a vector … by changing the weights in each PE and repeating all the other steps)
Li and Hamalainen both disclose hardware architecture of neural network implementation and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply Hamalainen’s “weight parallelism” architecture of multiplication and summation (Hamalainen, fig. 4, fig. 6 & page 449, col. 2, para. 2) to Li’s disclosure of multiplication and summation during network backpropagation (Li, sec. III, para. 1, ln. 3 – 4 where the basic arithmetic operations in the forward and backward passes consist of a multiplication and a summation) to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification to improve the cost/performance ratio (Hamalainen, abs. ln. 13 – 14).

Regarding Claim 3, Li in view of Hamalainen further disclose:  
the master computation circuit (Hamalainen, page. 451, col. 2, para. 3, ln. 3 – 6, where during execution, the host [master computation circuit] … controls the overall execution by sending real-time command to the system; page 449, col. 2, para. 2, where the root [master computation circuit] performs thresholding and stores the neuron output to a vector; ) is further configured to calculate a scaled weight gradient vector based on the default weight gradient vector and a predetermined threshold value (Li, eq. 6, where learning rate η is a scaler applying to δksoj(s-1) [weight gradient]) and a predetermined threshold value (Li, sec. II.A.1, para. 2 & eq. 3, where a unipolar sigmoid function as described in eq.3 has 0 to 1 boundary [predetermined threshold value]); and update one or more weight values based on the scaled weight gradient vector (Li, eq 6 where ∆wkjs is based on the scaled weight gradient).
In the architecture of Hamalainen, PE and communication unit performs only the multiplication and addition. The rest of the neural network operations are performed by host/root. Applying such architecture to Li’s disclosure, skilled in the art would appraise that the host/root [master computation circuit] applies the learning rate scaling and activation function thresholding.   
   
Regarding Claim 4, Li in view of Hamalainen further disclose:  
the master computation circuit is further configured to (Hamalainen, page. 451, col. 2, para. 3, ln. 3 – 6, where during execution, the host [master computation circuit] … controls the overall execution by sending real-time command to the system) apply a derivative of an activation function to the one or more first data gradients to generate one or more input gradients (Li, eq. 5 & sec. II.A.2.1, where δjs+1 is the difference between the teaching signal and the neuron output which is equal to the local gradient [input gradients]; f’(Hks) is the derivative of the activation function which apply to ꜫks, the error terms and local gradients [first data gradients], to generate δks [input gradients])
In the architecture of Hamalainen, host/root controls the overall operation thus including the applying of the derivatives of activation function of Li to the gradients.   




Regarding Claim 5, Li in view of Hamalainen further disclose:  
the one or more slave computation circuit are respectively configured to (Hamalainen, fig. 4, where the multiplication is performed by PE [slave computation circuit]) multiply one of the one or more input gradients with one or more weight vectors in a weight matrix to generate one or more multiplication results (Li, eq. 4, where calculation of multiplication and addition of wkjs+1 [weight] with δjs+1 [input gradients])

Regarding Claim 6, Li in view of Hamalainen further disclose:  
interconnection circuit is configured to combine the one or more multiplication results  calculated respectively by the one or more slave computation circuit vector (Hamalainen, page. 449, col. 2, para. 2, ln. 9 – 11 & fig. 4, where the communication network [interconnect unit] sums [combine] these multiplication results) into an output gradient (Li, eq. 4 & eq. 5, where during the backpropagation, δjs+1 the gradient output from s+1 layer [output gradient] and the gradient input to the s layer).

Regarding Claim 7, Li in view of Hamalainen further disclose:
the interconnection circuit is configured to channel data between the master computation circuit and the one or more slave computation circuit (Hamalainen, fig. 10, where mechanism to create a communication path [channel] to PU)

Regarding Claim 8, Li in view of Hamalainen further disclose:  
wherein each of the one or more slave computation circuits includes a slave neuron caching circuit configured to store the one or more first data gradients with the input data (Hamalainen, p. 451, col. 1, para. 4, ln. 5 – 6, & col. 2, para. 1, ln. 1, where the local memory [slave neuron caching unit] … is used for storage of run-time variables [one or more first data gradients; input data])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further integrating Hamalainen’s teaching of local caching to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification in order to store program code and run time variables (Hamalainen, page. 451, col. 1, para. 4, ln. 5 – 6 and col. 2, para. 1, ln. 1).

Regarding Claim 9 Li in view of Hamalainen further disclose:  
wherein each of the one or more slave computation circuits includes a weight value caching circuit configured to store the weight matrix that includes the one or more weight vectors (Hamalainen, p. 451, col. 1, para. 4, ln. 5 – 6, & col. 2, para. 1, ln. 1, where the local memory … is used for storage of run-time variables [one or more weight vectors])

Regarding Claim 10, Li discloses: 
A method for backpropagation in a fully connected layer of a neural network (Li, sec. 2, para. 2 where MLP using the backpropagation algorithm has two computational steps: 1 Forward computation … 2 Backward computation; fig. 1, where the layer 1 is fully connected), comprising:
receiving … input data (Li, eq. 6, where oj(s-1) is the output from s-1 layer and the input of layer s) and one or more first data gradients (Li, eq. 4, where ꜫks is the error terms and local gradients [first data gradients]) 
respectively multiplying … one of the one or more first data gradients with the input data to generate a default weight gradient vector (Li, fig. 6, where δks oj(s-1) [default weight gradient vector] is generated by multiplying δks [first data gradients] and oj(s-1) [input data; the output of s-1 layer is the input of layer s]).
Li does not explicitly disclose:
receiving, by a controller circuit, an instruction;
receiving, by a master computation circuit, data in response to the instruction
transmitting, by the master computation circuit, the data to one or more slave computation circuits via an interconnection circuit wherein the master computation circuit is communicatively connected to the one or more slave computation circuits via an interconnection circuit
respectively multiplying, by the one or more slave computation circuit, 
updating, by the master computation circuit, one or more weight values based on the default weight gradient vector
Hamalainen explicitly discloses:
receiving, by a controller circuit, an instruction (Hamalainen, page. 451, col. 1, para. 3, ln. 1 – 5 & fig. 6, where the root is an interface IFC [controller unit] to a host computer which receive instructions from the host to control the CU and PU);
receiving, by a master computation circuit, data in response to the instruction;  transmitting, by the master computation circuit, the data via an interconnection circuit (Hamalainen, page. 451, col 2, para. 3, ln. 3 – 6, where host [master computation circuit] serves data transfers [receive input data and one or more first data gradients; transmit data to slave computation circuit; through interconnection circuit between Master and Slave])
wherein the master computation circuit is communicatively connected to the one or more slave computation circuits via an interconnection circuit (Hamalainen, where the communication network [interconnect circuit] between PU [slave computation circuit] and IFC [master computation circuit] are connected for data communication)
respectively multiplying, by the one or more slave computation circuit (Hamalainen, page 449, col. 2, para. 2, where in the weight parallelism … the task of the PE is simply to multiply), 
updating, by the master computation circuit, one or more weight values based on the default weight gradient vector (Hamalainen, page. 451, col. 2, para. 3, ln. 3 – 6, where during execution, the host [master computation circuit] … controls the overall execution by sending real-time command to the system; page 449, col. 2, para. 2, where the root [master computation circuit] performs thresholding and stores the neuron output to a vector … by changing the weights in each PE and repeating all the other steps)
Li and Hamalainen both disclose hardware architecture of neural network implementation and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply Hamalainen’s “weight parallelism” architecture of multiplication and summation (Hamalainen, fig. 4, fig. 6 & page 449, col. 2, para. 2) to Li’s disclosure of multiplication and summation during network backpropagation (Li, sec. III, para. 1, ln. 3 – 4 where the basic arithmetic operations in the forward and backward passes consist of a multiplication and a summation) to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification to improve the cost/performance ratio (Hamalainen, abs. ln. 13 – 14).

Regarding Claim 12, Li in view of Hamalainen further discloses:  
Calculating, by the master computation circuit (Hamalainen, page. 451, col. 2, para. 3, ln. 3 – 6, where during execution, the host [master computation circuit] … controls the overall execution by sending real-time command to the system; page 449, col. 2, para. 2, where the root [master computation circuit] performs thresholding and stores the neuron output to a vector; ), a scaled weight gradient vector based on the default weight gradient vector and a predetermined threshold value (Li, eq. 6, where learning rate η is a scaler applying to δksoj(s-1) [weight gradient]) and a predetermined threshold value (Li, sec. II.A.1, para. 2 & eq. 3, where a unipolar sigmoid function as described in eq.3 has 0 to 1 boundary [predetermined threshold value]); and update one or more weight values based on the scaled weight gradient vector (Li, eq 6 where ∆wkjs is based on the scaled weight gradient); 
and update, by the master computation circuit (Hamalainen, page. 451, col. 2, para. 3, ln. 3 – 6, where during execution, the host [master computation circuit] … controls the overall execution by sending real-time command to the system), one or more weight values based on the scaled weight gradient vector (Li, eq 6 where ∆wkjs is based on the scaled weight gradient).
In the architecture of Hamalainen, PE and communication unit performs only the multiplication and addition. The rest of the neural network operations are performed by host/root. Applying such architecture to Li’s disclosure, skilled in the art would appraise that the host/root [master computation circuit] applies the learning rate scaling and activation function thresholding.   

Regarding Claim 13, Li in view of Hamalainen further discloses:  
Applying, by the master computation circuit (Hamalainen, page. 451, col. 2, para. 3, ln. 3 – 6, where during execution, the host [master computation circuit] … controls the overall execution by sending real-time command to the system), a derivative of an activation function to the one or more first data gradients o generate one or more input gradients (Li, eq. 5 & sec. II.A.2.1, where δjs+1 is the difference between the teaching signal and the neuron output which is equal to the local gradient [input gradients]; f’(Hks) is the derivative of the activation function which apply to ꜫks, the error terms and local gradients [first data gradients], to generate δks [input gradients]).
In the architecture of Hamalainen, host/root controls the overall operation thus including the applying of the derivatives of activation function of Li to the gradients.   

Regarding Claim 14, Li in view of Hamalainen further discloses:  
Multiplying, by the one or more slave computation circuits (Hamalainen, fig. 4, where the multiplication is performed by PE [slave computation circuit]), one of the one or more input gradients with one or more weight vectors in a weight matrix to generate one or more multiplication results (Li, eq. 4, where calculation of multiplication and addition of wkjs+1 [weight] with δjs+1 [input gradients]) 

Regarding Claim 15, Li in view of Hamalainen further discloses:  
Combining, by the interconnection circuit, the one or more multiplication results calculated respectively by the one or more slave computation circuit (Hamalainen, page. 449, col. 2, para. 2, ln. 9 – 11 & fig. 4, where the communication network [interconnect unit] sums [combine] these multiplication results) into an output gradient vector (Li, eq. 4 & eq. 5, where during the backpropagation, δjs+1 the gradient output from s+1 layer [output gradient] and the gradient input to the s layer).

Regarding Claim 16, Li in view of Hamalainen further disclose:  
channeling, by the interconnection circuit, data between the master computation circuit and the one or more slave computation circuit (Hamalainen, fig. 10, where mechanism to create a communication path [channel] to PU)

Regarding Claim 17, Li in view of Hamalainen further disclose:
Storing, by a slave neuron caching circuit of each of the one or more slave computation circuits, the one or more first data gradients with the input data (Hamalainen, p. 451, col. 1, para. 4, ln. 5 – 6, & col. 2, para. 1, ln. 1, where the local memory [slave neuron caching unit] … is used for storage of run-time variables [one or more first data gradients; input data])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further integrating Hamalainen’s teaching of local caching to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification in order to store program code and run time variables (Hamalainen, page. 451, col. 1, para. 4, ln. 5 – 6 and col. 2, para. 1, ln. 1).

Regarding Claim 18 Li in view of Hamalainen further disclose:
Storing, by a weight value caching circuit of each of the one or more slave computation circuits, the weight matrix that includes the one or more weight vectors (Hamalainen, p. 451, col. 1, para. 4, ln. 5 – 6, & col. 2, para. 1, ln. 1, where the local memory … is used for storage of run-time variables [one or more weight vectors])

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure: Zhen, WO2017185248A1 Apparatus and Method for Performing Auto-Learning Operation of Artificial Neural Network. Zhen discloses the implementation of neural network chip having slave module for multiplication and interconnect module for summation used in a learning/training process. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIEN MING CHOU whose telephone number is (571)272-9354.  The examiner can normally be reached on Monday- Friday 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, CHAKI KAKALI can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/S.C./Examiner, Art Unit 2122                                                                                                                                                                                                        
/BRIAN M SMITH/Primary Examiner, Art Unit 2122