DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12/23/2021 has been entered.
 
Status of the Claims
This action is in response to the remark entered on 12/23/2021

Claim 1 – 20 are pending and have been examined.

Claims 1, 14 and 15 are amended and pending in current application.

Drawing rejection has been withdrawn in light of applicant’s remarks and amendment. 

Claim rejection under 35 U.S.C 112b has been withdrawn in light of applicant’s remarks and amendment. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Foreign Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 2/19/2019 and 7/7/2021 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Arguments
Applicant’s arguments with respect to claim rejection under 35 U.S.C. 102 and 103 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. The difference between hi and h^i in Lee is now cited as the gradient that the model is learning to map during the learning phase. This gradient is back propagation from the calculation of loss of the last layer and based on the current parameter of the network. And the difference between g(hi+1) and g(h^i+1) is now cited as the synthetic gradient that is used as the proximation of the gradient, the difference between hi and h^i. Thus, Lee discloses generate a synthetic gradient "that is an approximation of a gradient of the objective function with respect to parameters of the first subnetwork" that "requires backward propagation of gradients from the last subnetwork to the second subnetwork and from the second subnetwork into the first subnetwork" to compute.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claim 1 – 2, 5 – 6, 9 – 10, 12 – 16 and 19 – 20  are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Lee et al., Difference target propagation; LNCS Vol 9284, pp. 498-515, 2015.

Regarding Claim 1, Lee discloses: A method performed by one or more computers for training a neural network (Lee, sec. 4, Ack. Ln. 4 – 5, where the method of training neural network is carry out within computer programs)  
to perform a task on training data comprising (See at least Page 500 for training data on section 2.1 formulating targets for objective function provides task) a plurality of training inputs (Lee, sec. 2.1, ln. 2, where x is input) by optimizing an objective function (Lee, alg. 1, where the process backpropagate [optimize] the h^i in the local loss function [objective function] in the training process),
wherein the neural network is configured to receive a network input and to process the network input to generate a network output (Lee, sec. 2.1, ln. 2, where x[input], y[output]),
The objective function is based on a measure of difference between (i) network outputs generated by the neural network by processing the training inputs and (ii) respective target outputs for each of the training inputs that are specified in the training data (See at least sec. 2.1, para. 3, where global loss function [objective function] measure the appropriateness of the network output … e.g. MSE [based on the difference between network output and target output]). 
wherein the neural network comprises a plurality of subnetworks that include a first subnetwork followed by a second subnetwork and followed by a last subnetwork (Lee, sec. 2.1, eq. 1, where layer 1 to layer M-2 [first subnetwork] followed by layer M-1 [second subnetwork] and followed by layer M [last subnetwork]),
 the first subnetwork is configured to, during the processing of the network input by the neural network, receive a corresponding subnetwork input for the network input, process the corresponding subnetwork input for the network input to generate a corresponding subnetwork activation for the network input and provide the corresponding subnetwork activation for the network input as input to the second subnetwork (Lee, alg. 1, ln. 2 – 4, where layer 1 to layer M-2 [first subnetwork] takes layer training input h0 and generate output hm [subnetwork activation] as the input of layer m – 1 [second subnetwork]; See also Page 500 for using feedforward mapping with non-linear activation function from one layer to another on Section 2.1 Formulating Targets)
 the method comprises, for each training input: processing the training input using the neural network to generate a training model output for the training input, comprising:
inputting, into the first subnetwork, a corresponding subnetwork input for the training input into the first subnetwork (Lee, alg. 1, ln. 2 – 4, where layer 1 to layer M-2 [first subnetwork] takes h0 [corresponding subnetwork input] training input );
processing the corresponding subnetwork input for the training input using the first subnetwork to generate a corresponding subnetwork activation for the training input in accordance with current values of parameters of the first subnetwork and providing the corresponding subnetwork activation for the training input as input to the second subnetwork (Lee, alg. 1, ln. 2 – 4, where layer 1 – layer M-2 [first subnetwork] generate output h2 [subnetwork activation] as the input of layer M-1 [second subnetwork] using current parameter of f1 – fM-2 [parameter of the first subnetwork]; See also Page 500 for using feedforward mapping with non-linear activation function from one layer to another on Section 2.1 Formulating Targets)
determining a synthetic gradient for the first subnetwork that is an approximation of a gradient of the objective function with respect to parameters of the first subnetwork (Lee, fig. 1, where during the learning, the difference of hm-2 and h^m-2 [gradient of first subnetwork] is been approximated by the difference of g(hm-1) and g(h^m-1) [synthetic gradient of first subnetwork] which is calculated from the forward pass through the subnetworks using current parameter of the subnetworks; eq. 2, where the learning is to reduce the loss of a loss function [objective function]), wherein computing the gradient of the objective function with respect to parameters of the first subnetwork requires backward propagation of gradients from the last subnetwork to the second subnetwork and from the second subnetwork into the first subnetwork (Lee, fig. 1 & alg. 1, ln. 5 – 9, where the loss [gradient of the objective function of last layer] is backward propagated to calculate the difference of hm-1 – h^m-1 [gradient of the second layer] and backward propagate to calculate the difference of hm-2 – h^m-2 [gradient of the first layer]) and wherein determining the synthetic gradient that is an approximation of the gradient comprises processing the corresponding subnetwork activation using a synthetic gradient model for the first subnetwork in accordance with current values of parameters of the synthetic gradient model, wherein the synthetic gradient model for the first subnetwork is configured to process the corresponding subnetwork activation in accordance with the current value of the parameter of the synthetic gradient model to generate the synthetic gradient for the first subnetwork (Lee, alg. 1, & fig. 1, where the difference between g(hm-1) and g(h^m-1) [synthetic gradient of first subnetwork] is generated with g() [synthetic gradient model according with the current value of parameter] by hm-1 [subnetwork activation])
updating the current values of the parameters of the first subnetwork using the synthetic gradient (Lee, alg. 1, ln. 16 – 19, where parameters of f1 to fm-2 [first subnetwork parameters] are updated by h^m-2 which is calculated based on the difference between g(hm-1) and g(h^m-1) [synthetic gradient] in the prior step);

	Regarding Claim 2, depending on Claim 1, Lee further discloses: 
wherein the synthetic gradient model is a different neural network from each of the neural network, the first subnetwork, and the second subnetwork (Lee, eq. 11 & sec. 2.2, para. 4, ln. 4 – 9, where fi [neural network] and gi [synthetic gradient model] can be seen and trained as auto-encoder-decoder pair that is a separate neural network; i.e., f1 – fm-2 [first subnetwork] and fm-1 [second subnetwork] are also different from the g1 - gm-2 and gm-1 [synthetic gradient models]).

Regarding Claim 5, depending on Claim 1, Lee further discloses: 
wherein the first subnetwork comprises multiple neural network layers (Lee, sec. 2.1 & eq. 1, where layer 1 – m-2 [first subnetwork] has multiple layers), and wherein updating the current values of the parameters of the first subnetwork using the synthetic gradient comprises
backpropagating the synthetic gradient through the first subnetwork to update the current values of the parameters of the first subnetwork (Lee. Alg. 1, ln. 7 – 9, where synthetic gradient h^1 – h^m-2 are backpropagated from higher layer to lower layer; ln. 16 - 19 update parameters of the first subnetwork is based on backpropagated synthetic gradients h^1 – h^m-2).

Regarding Claims 6, depending on Claim 1, Lee further discloses: 
wherein the neural network is a feedforward neural network (Lee, sec. 3, para. 1, ln. 2, where feedforward neural network), the first subnetwork is a first neural network layer, and the second subnetwork is a second neural network layer (Lee, sec. 2.1 & eq. 1, where layer 1 -m-2 takes training input h0 and is the first neural network layer, layer m-1 follows layer m-2 is the second neural network layer).

Regarding Claim 9, depending on Claim 1, Lee further discloses: 
wherein updating the current values of the parameters of the first subnetwork using the synthetic gradient comprises updating the current values of the parameters using the synthetic gradient in place of an actual backpropagated gradient (Lee alg. 1, ln. 16 – 19, & sec. 1, para. 5, ln. 4 – 7, update parameters of f1 – fm-2 [parameter of first subnetwork] using h^1 – h^m-2 [synthetic gradient] in place of using regular back-propagation [actual backpropagated gradient]).

Regarding Claim 10, depending on Claim 1, Lee further discloses: 
wherein updating the current values of the parameters of the first subnetwork using the synthetic gradient comprises updating the current values of the parameters using the synthetic gradient asynchronously from updating current values of the parameters of the second subnetwork (Lee, intro. para. 3, ln. 11 – 12, where the main idea of propagation … once a good target is computed a layer-local training criteria can be defined and update each layer separately [asynchronously]). 

Regarding Claim 12, depending on Claim 1, Lee further discloses:
wherein the subnetwork input for the training input is a synthetic subnetwork input (Lee, sec. 2.4, para. 2, ln. 3 – 5, where training input [subnetwork input] for the auto-encoder is noise injected[synthetic]), and wherein the method further comprises: processing the training input using a synthetic input model that is configured to process the training input to generate the synthetic subnetwork input (Lee, sec. 2.4, para. 2, ln. 3 – 5, where a process [by synthetic input model] to inject noise [process the training input] and generate noise injected input [synthetic subnetwork input] for training the auto-encoder).

Regarding Claim 13, depending on Claim 1, Lee further discloses:
wherein the subnetwork input for the training input is an actual subnetwork input (Lee, alg. 1, ln. 2 – 4, where fi is calculating using actual data [actual subnetwork input]).

Regarding Claim 14, Claim 14 is the non-transitory computer-readable storage media claim corresponding to Claim 1. Lee further discloses: 
One or more non-transitory computer-readable storage media encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform operations (Lee, sec. 3, para. 4, ln. 3 – 4, where the method is carry out by computer using computer program code [instruction] that are stored in memory [computer-readable storage media] for execution). 

Regarding Claim 15, Claim 15 is the system claim corresponding to Claim 1. Lee further discloses: 
A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computer cause the one or more computer to perform operations (Lee, sec. 3, para. 4, ln. 3 – 4, where the method is carry out by computer that execute program code [instruction] in a computer system).

Regarding claim 16, 19 and 20, Claim 16, 19, and 20 are system claim corresponding to Claim 2, 5 and 6. Claim 16, 19 and 20 are rejected with the same reason as Claim 2, 5 and 6. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 3 – 4 and 17 – 18 are rejected under 35 U.S.C. 103 as being unpatentable over Lee et al., Difference target propagation; LNCS Vol 9284, pp. 498-515, 2015 in view of AWS, Retraining Models on New Data, Amazon Machine Learning Documentation, 2015.

Regarding Claim 3, depending on Claim 1, Lee discloses a method of Claim 1, Lee did not explicitly discloses: for each training input, determining a target gradient for the first subnetwork; and updating the current values of the parameters of the synthetic gradient model based on an error between the target gradient and the synthetic gradient.
AWS explicitly discloses: 
for each training input, determining a target gradient for the first subnetwork; and updating the current values of the parameters of the synthetic gradient model based on an error between the target gradient and the synthetic gradient (AWS, ln. 4 – 6, where it is a good practice to continuously monitor the incoming data and retrain your model on newer data; based on the training approach of Lee in algorithm 1, the model f of each layers are trained to map data from hi-1 to h^i which lead to new data distribution and would benefit by retraining. The backpropagation training is typically based on the difference [error] between the model output [synthetic gradient] and the target output [target gradient]).
Lee and AWS both discloses method of training of neural network and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Lee’s teaching of the training with difference target propagation approach with AWS’s teaching of retraining neural network when data distribution changed to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification to keep model accuracy (AWS, ln. 1).

	Regarding Claim 4, depending on Claim 3, Lee in view of AWS further discloses: 
wherein determining the target gradient for the first subnetwork comprises: backpropagating an actual gradient of the objective function through the neural network to determine the target gradient (The retraining of AWS is based on the new data distribution which are the h^ in Lee’s disclosure and so are the target data of the retraining. Lee, alg. 1, ln. 5 – 9, where h^ is back propagated and calculated from the actual gradient of the loss function [objective function]); 
or backpropagating a synthetic gradient for the second subnetwork through the second subnetwork to determine the target gradient for the first subnetwork.
The reason for the combination is the same as Claim 3. 

Regarding Claim 17 and 18, Claim 17 and 18 are the system claim corresponding to Claim 3 and 4. Claim 17 and 18 are rejected with the same reason as Claim 3 and 4. 

Claim 7 are rejected under 35 U.S.C. 103 as being unpatentable over Lee et al., Difference target propagation; LNCS Vol 9284, pp. 498-515, 2015 in view of Feldkamp, Phased Backpropagation: A Hybrid of BPTT and Temporal BP, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence Vol. 3, 2262-2267, 1998.

Regarding Claim 7, depending on Claim 1, Lee discloses a method of Claim 1, Lee did not explicitly discloses: wherein the neural network is an unrolled recurrent neural network, the first subnetwork is the recurrent neural network at a first time step, and the second subnetwork is the recurrent neural network at a second time step.
Feldkamp explicitly discloses: 
wherein the neural network is an unrolled recurrent neural network (Feldkamp, fig. 1 & sec. 2, para. 1, ln. 14 – 20, where the neural network input are in different time steps [unrolled neural network]), the first subnetwork is the recurrent neural network at a first time step, and the second subnetwork is the recurrent neural network at a second time step (Feldkamp, fig. 1, where node 2 – 6 [first subnetwork] are the recurrent neural network at first time step and node 7 – 8 [second subnetwork] are the recurrent neural network at second time step).
Lee and Feldkamp both discloses method of training deep backpropagation neural network and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Lee’s teaching of the training with difference target propagation approach with Feldkamp’s teaching of recurrent neural network to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification to enable training of network that involve temporal data (Feldkamp, intro. para. 1).

Claim 8 are rejected under 35 U.S.C. 103 as being unpatentable over Lee et al., Difference target propagation; LNCS Vol 9284, pp. 498-515, 2015 in view of Feldkamp, Phased Backpropagation: A Hybrid of BPTT and Temporal BP, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence Vol. 3, 2262-2267, 1998 further in view of AWS, Retraining Models on New Data, Amazon Machine Learning Documentation, 2015..

Regarding Claim 8, depending on Claim 7. Lee in view of Feldkamp discloses the method of Claim 7. Lee in view of Feldkamp further discloses:  
determining at least one future synthetic gradient of the objective function for the first subnetwork by processing the subnetwork activation using the synthetic gradient model in accordance with current values of parameters of the synthetic gradient model (Lee, alg. 1, ln. 8, where g(hm-1)- g(h^m-1) [future synthetic gradient of first subnetwork] is calculated based on loss [objective function] and is calculated by apply hm-1 [subnetwork activation] using the current parameter of g() [synthetic gradient model]);
Lee in view of Feldkamp do not explicitly disclose:
updating the current values of the parameters of the synthetic gradient model based on an error between each future synthetic gradient and a corresponding target future gradient 
AWS explicitly discloses: 
updating the current values of the parameters of the synthetic gradient model based on an error between each future synthetic gradient and a corresponding target future gradient (AWS, ln. 4 – 6, where it is a good practice to continuously monitor the incoming data and retrain your model on newer data; based on the training approach of Lee in algorithm 1, the model f of each layers are trained to map data from hi-1 to h^i which lead to new data distribution and would benefit by retraining. The backpropagation training is typically based on the difference [error] between the model output [future synthetic gradient] and the target output [target future gradient]).
Lee (in view of Feldkamp) and AWS both discloses method of training of neural network and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Lee (in view of Feldkamp)’s teaching of the training RNN with difference target propagation approach with AWS’s teaching of retraining neural network when data distribution changed to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification to keep model accuracy (AWS, ln. 1).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Lee et al., Difference target propagation; LNCS Vol 9284, pp. 498-515, 2015 in view of Copjak, Advanced Architectures Distributed System for the Implementation of Neural Networks, 12th IEEE International Conference on Emerging eLearning Technologies and Applications, 2014.

Regarding Claim 11, depending on Claim 1, Lee discloses a method of Claim 1, Lee did not explicitly discloses: wherein the first subnetwork is implemented on one computing device and the second subnetwork is implemented on a different computing device; 
and optionally wherein: the training is part of a distributed machine learning training process that distributes the training across multiple computing devices.
Copjak explicitly discloses: 
wherein the first subnetwork is implemented on one computing device and the second subnetwork is implemented on a different computing device (Copjak, fig. 7, where the example parallelism model demonstrate the parallelism can be done among neural network layers); 
and optionally wherein: the training is part of a distributed machine learning training process that distributes the training across multiple computing devices (Copjak, abs. 6 – 8, intro. ln. 3 – 20 & fig. 7, where both training and life stage of the neural network implemented in the distributed system across multiple machines).
Lee and Copjak both discloses method of implementing deep neural network and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Lee’s teaching of the deep neural network system with Copjak’s teaching of distributed architecture to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification to solve the time and computation complexity issue of neural network (Copjak, abs. ln. 4).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIEN MING CHOU whose telephone number is (571)272-9354.  The examiner can normally be reached on Monday- Friday 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, CHAKI KAKALI can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/S.C./Examiner, Art Unit 2122   

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122