DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments
Acknowledgement is made of Applicant's claim amendments on 12/10/2021. The claim amendments are entered. Presently, claims 1-20 are now pending. Claims 1, 3-8, and 10-20 have been amended.

Response to Arguments
Applicant's arguments filed on 12/10/2021 have been fully considered but they are not persuasive.

Applicant argues that the Desjardins allegedly does not teach the amended claim limitations because it allegedly does not teach the elements of the claim limitations such as a neural network and copying, and generating a task specific neural network (Applicant’s Reply pgs. 8-9). This is not persuasive. First, Desjardins teaches machine learning model comprising a neural network (see e.g. [0033]). Desjardins also teaches that the machine learning model comprising the neural network can update/change its parameters as needed to learning new tasks, i.e. the machine learning model is copying the current version of the model and updating it as needed. This is shown in the various mappings below. Regarding the generation of a task specific neural network, it is noted that Desjardins 
Applicant also argues that the dependent claims should be permissible since the independent claims are permissible because the previously cited references allegedly do not teach the various claim limitations (Applicant’s reply pgs. 10-11). This is not persuasive because as described above and shown in the updated mapping below, the cited references teach the various claim limitations. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 2, 8, 9, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Desjardins et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2019/0236482, hereinafter Desjardins) in view of Zoph et. al.  (U.S. Pat. App. Pre-Grant Pub. No. 2019/0251439, hereinafter Zoph).

Regarding claim 1, Desjardins teaches:
A system for training a neural network, the system comprising: 
at least one memory including a training module ([0030]-[0031]: describing a machine learning (ML) system that can be “configured to train a machine learning model”, wherein the ML system can be “implemented as computer programs on one or more computers”. Whereby computer programs can be stored via various storage devices or media ([0082] and [0088]).); 
a processor coupled to the at least one memory ([0082]-[0083] and [0087]: describing processors and hardware coupled to the storage devices for executing the programmable instruction on the storage devices.); and 
 ([0031] and [0035]: describing that “[t]he machine learning system 110 is configured to train a machine learning model 110 on multiple machine learning tasks sequentially”. See also [0028]-[0029]: describing that the task can be comprise “different supervised learning tasks” or “different reinforcement tasks”.)  and the neural network to be trained on the plurality of sequential tasks ([0031], [0033], [0035], and [0045]: describing that the machine learning system comprising the neural network can be trained on multiple sequential tasks.); 
for each task in the plurality of sequential tasks (see the previous citation regarding the various sequential tasks): 
generate a copy of the neural network that includes a plurality of layers ([0033]: describing that the machine learning model can comprise “a deep machine learning model that employs multiple layers of the model…. [Wherein] a deep neural network is a deep machine learning model….”); 
…; 
identify s in the task specific neural network … , wherein the parameters are associated with architectural weights ([0036]: describing the determination of importance weights for the machine learning model and an acceptable level of the performance via the weights, wherein “[t]he set of importance weights for a given task generally includes a respective weight for each parameter of the model 110 that represents a measure of an importance of the parameter to the model 110 achieving acceptable performance on the task”. See also [0037]-[0040], [0047]-[0051], and [0055]: describing in further details the computation of the weights and an optimization of it in correlation with the parameters of the machine learning model.); 
s in the task specific neural network ([0037] and [0044]: describing the retraining based on the parameters of the machine learn model for the various tasks. See also [0059]-[0064]: describing in further details the training for the respective iterations of the various tasks.) to identify a parameter from the parameters with a corresponding maximum architectural weight from the architectural weights ([0036]-[0037]: describing a determination of importance of each parameter in a plurality of parameters, wherein the importance is based on weights in the neural network, i.e. architectural weights. From the determination, the parameter with a certain importance weight can be identified, wherein the level of importance can denote a maximal weight.); and 
update the neural network with the ([0034], [0036], [0048]-[0049], and [0063]-[0064]: describing that the parameters are adjusted during training and that “[t]he adjusted values of the parameters are then used as current values of the parameters in the next iteration” of training the machine learning model. Wherein the parameters comprise the identified parameter as previously described.); and 
wherein the neural network trained on the plurality of sequential tasks is a trained neural network ([0034]-[0035] and [0064]-[0065]: describing that after the machine learning model has been iteratively trained on the parameter values, training data, importance weights and the like, the result is a trained machine learning model that can optimally perform the tasks.).

While the cited reference Desjardins teaches the limitations of claim 1, it does not explicitly teach “generate a task specific neural network from the copy of the neural network by performing an architectural search on the plurality of layers in the copy of the neural network, the plurality of candidate choices” on lines 14-15. Zoph discloses the claim limitations, teaching: a neural architecture search system 100 [that] includes a controller neural network 110, a training engine 120, and a controller parameter updating engine 130” (Zoph [0026]). Wherein the search system is used “to determine an architecture for a child neural network that is configured to perform the particular task. The architecture defines the number of layers in the child neural network, the operations performed by each of the layers, and the connectivity between the layers in the child neural network, i.e., which layers receive inputs from which other layers in the child neural network.” (Zoph [0023]). 
The architecture of each child neural network can be determined via an “output sequence generated by the controller neural network” (Zoph [0027]). Wherein the controller neural network can have multiple replicas that then have multiple related child neural networks (Zoph [0035]). With each layer of the various child networks being evaluated and selected to have an optimum performance for a particular task (Zoph [0030]-[0033]).
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the process of training the neural network for sequential tasks in the cited reference to include the architectural search in Zoph. Doing so would enable “a system implemented as computer programs on one or more computers in one or more locations that determines, using a controller neural network, an architecture for a child neural network that is configured to perform a particular neural network task” (Zoph [0014]). Wherein the system is a “neural architecture search system” (Zoph [0022]).
Regarding claim 2, Desjardins teaches:
The system of claim 1, wherein the processor is further configured to: perform a new task using the trained neural network ([0042]-[0044]: describing the trained machine learning model can be used to perform additional tasks B and C. Similarly, see [0052] and [0072-[0073]: describing the trained machine learning model can be used for a second or third task.).

Regarding independent claim 8, claim 8 is substantially similar to independent claim 1 and therefore is rejected on the same grounds as claim 1. Claim 8 is a method claim that corresponds to system claim 1.

Regarding claim 9, the rejection of claim 8 is incorporated. While the cited references teach the claim limitation “for each task in the plurality of sequential tasks” as previously shown, Zoph further teaches:
“generating a task specific layer in the task specific neural network (Zoph [0033]: describing that the neural architecture system comprising a controller neural network can “output architecture data 150 that specifies the architecture of the child neural network, i.e., data specifying the layers that are part of the child neural network, the connectivity between the layers, and the operations performed by the layers”.); and 
updating the neural network with the task specific layer (Zoph [0030]-[0033] and [0035]: describing transmission of updated parameters via the central updating server to the controller neural network and consequently, the child neural networks to achieve the desired architecture in the child neural network.).”
Zoph. Doing so would enable a “neural architecture search system 100 is a system that obtains training data 102 for training a neural network to perform a particular task”, wherein the architecture defines the layers and operations in the layers of the neural network (Zoph [0023]).

Regarding independent claim 15, claim 15 is substantially similar to independent claim 1 and therefore is rejected on the same grounds as claim 1. Claim 15 is a medium claim that corresponds to system claim 1. 
A mapping is shown below for the preamble of claim 15 since that differs from claim 1. Desjardin teaches:
“A non-transitory machine-readable medium having stored thereon machine-readable instructions executable to cause a machine to perform operations that train a neural network, the operations comprising ([0082]-[0083]: describing a “non-transitory storage medium” that can be executed by various hardware/processors/computing machines to implement the process. Wherein the process can include a machine learning (ML) system that can be “configured to train a machine learning model” ([0030]-[0031]).)….”

Claims 3, 5, 7, 10, 12, 14, 16, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Desjardins et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2019/0236482, hereinafter Desjardins) and Zoph et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2019/0251439, Zoph) in view of Rabinowitz et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0337464, hereinafter Rabinowitz).

Regarding claim 3, the rejection of claim 1 is incorporated. While the cited references in combination teach the claim limitation “wherein the at least one candidate choice in the plurality of candidate choices” as previously shown, they do not explicitly teach: “reuses [[a]] one of parameters in the at least one layer of the neural network, wherein the one of the parameters is included in the neural network before the copy of the neural network is generated”. Rabinowitz discloses the claim limitations, teaching: that the parameters can be reused in the neural network, wherein such parameters can “be integrated at each layer of the current model” (Rabinowitz [0030]). Wherein the parameters comprise a parameter value (Rabinowitz [0052] and [0055]). And that “each neural network may be copied before fine-tuning to explicitly remember all previous tasks”, wherein the elements being copied include “learnt neural network parameters corresponding to previous tasks” (Rabinowitz [0028]). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the process of training the neural network for sequential tasks along with the task specific layers of the neural network in the combined cited references to include the reuse and copying in Rabinowitz. Doing so would enable a “progressive neural network system 100 [that] may learn multiple machine learning tasks in sequence, where[in] task features are preserved so that new tasks can benefit from all previously learned features and so that the final neural network system can be evaluated on each machine learning task” (Rabinowitz [0039]). 

Regarding claim 5, the rejection of claim 1 is incorporated. Zoph further teaches:
The system of claim 1, wherein at least one candidate choice in the plurality of candidate choices adds an adaptation to one of parameters in the at least one layer of the task specific18Attorney Docket No. 70689.51US01 salesforce.com, inc. Reference No. A4159USneural network (Zoph [0040] and [0042]-[0045]: describing that a child neural network can have hyperparameters defining convolutional filters with corresponding height, width, and stride values that can be can be added/modified as desired. Wherein the hyperparameters comprise a respective parameter, e.g. respective of the layers (Zoph [0028] and previous citation).), ….
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the process of training the neural network for sequential tasks in the combined cited references to include the parameters in Zoph. Doing so would enable “a system implemented as computer programs on one or more computers in one or more locations that determines, using a controller neural network, an architecture for a child neural network that is configured to perform a particular neural network task” (Zoph [0014]). Wherein the system is a “neural architecture search system” (Zoph [0022]).

While the cited reference Zoph teaches the above limitations of claim 5, it does not explicitly teach: “wherein the one of parameters is included in the neural network before the copy of the neural network is generated”. Rabinowitz teaches: that “each neural network may be copied before fine-tuning to explicitly remember all previous tasks”, wherein the elements being copied include “learnt neural network parameters corresponding to previous tasks” (Rabinowitz [0028]). Wherein the parameters comprise a parameter value (Rabinowitz [0052] and [0055]).
Rabinowitz. A motivation to combine the cited references with Rabinowitz was previously given.

Regarding claim 7, the rejection of claim 1 is incorporated. While the cited references in combination teach the claim limitation “wherein the training module is further configured to retrain the task specific neural network” as shown above, Zoph further teaches: 
“by tuning one of parameters that at least one candidate choice identifies (Zoph [0034]: describing that the neural network search system can obtain a child neural network with the desired architectural characteristics by “fine-tun[ing] the parameter values”. Wherein the parameter values comprises a respective parameter (Zoph [0028]).)….” 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the process of training the neural network for sequential tasks in the combined cited references to include the tuning in Zoph. Doing so would enable “a system implemented as computer programs on one or more computers in one or more locations that determines, using a controller neural network, an architecture for a child neural network that is configured to perform a particular neural network task” (Zoph [0014]). Wherein the system is a “neural architecture search system” (Zoph [0022]).

While the cited reference Zoph teaches the above limitations of claim 7, it does not explicitly teach: “as one of the parameters that is reused from the neural network before the copy Rabinowitz discloses the claim limitations, teaching: that the parameters can be reused in the neural network, wherein such parameters can “be integrated at each layer of the current model” (Rabinowitz [0030]). And that “each neural network may be copied before fine-tuning to explicitly remember all previous tasks”, wherein the elements being copied include “learnt neural network parameters corresponding to previous tasks” (Rabinowitz [0028]). Wherein the parameters comprise a parameter value (Rabinowitz [0052] and [0055]).
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the tuning in the cited reference to include the copying in Rabinowitz. Doing so would enable a “progressive neural network system 100 [that] may learn multiple machine learning tasks in sequence, where[in] task features are preserved so that new tasks can benefit from all previously learned features and so that the final neural network system can be evaluated on each machine learning task” (Rabinowitz [0039]).  

Regarding claim 10, claim 10 is substantially similar to claim 3 and therefore is rejected on the same grounds as claim 3. Claim 10 is a method claim that corresponds to system claim 3.

Regarding claim 12, claim 12 is substantially similar to claim 5 and therefore is rejected on the same grounds as claim 5. Claim 12 is a method claim that corresponds to system claim 5.

Regarding claim 14, claim 14 is substantially similar to claim 7 and therefore is rejected on the same grounds as claim 7. Claim 14 is a method claim that corresponds to system claim 7
Regarding claim 16, claim 16 is substantially similar to claim 3 and therefore is rejected on the same grounds as claim 3. Claim 16 is a medium claim that corresponds to system claim 3.

Regarding claim 18, claim 18 is substantially similar to claim 5 and therefore is rejected on the same grounds as claim 5. Claim 18 is a medium claim that corresponds to system claim 5.

Regarding claim 20, claim 20 is substantially similar to claim 7 and therefore is rejected on the same grounds as claim 7. Claim 20 is a medium claim that corresponds to system claim 7.

Claims 4, 11, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Desjardins et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2019/0236482, hereinafter Desjardins) and Zoph et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2019/0251439, hereinafter Zoph) in view of Chilimbi et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2016/0092765, hereinafter Chilimbi).

Regarding claim 4, the rejection of claim 1 is incorporated. The cited references in combination do not explicitly teach: “wherein at least one candidate choice in the plurality of candidate choices generates a new parameter in the task specific neural network”. Chilimbi discloses the claim limitations, teaching: determining a varying combination of different parameters for the replica units and their corresponding worker units (Chilimbi [0145]-[0147]). Wherein the selection of different combination creating different subsets of parameters can constitute new parameters for the replicas that perform various tasks. Whereby the replicas can each represent a copy of a deep neural network model (Chilimbi [0082]). The parameters comprising parameters related to weights, neurons, and layer composition of the replicas and worker units (Chilimbi [0108]-[0109], [0115]-[0116], and [0119]-[0120]). New parameters such as weight parameters can be provided to the replicas via a parameter module (Chilimbi [0083]).
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the process of training the neural network for sequential tasks along with the task specific layers of the neural network in the combined cited references to include the parameters in Chilimbi. Doing so would enable “a distributed processing system (DPS) corresponds to a set of computing units 106 that performs a graph processing task, such as training the type of DNN model 114 described in Subsection A.2. Each particular DPS embodies a resource allocation architecture….” (Chilimbi [0077]). Wherein the resource allocation architecture comprises replica units with “one or more parameter modules” (Chilimbi [0083]). 

Regarding claim 11, claim 11 is substantially similar to claim 4 and therefore is rejected on the same grounds as claim 4. Claim 11 is a method claim that corresponds to system claim 4.

Regarding claim 17, claim 17 is substantially similar to claim 4 and therefore is rejected on the same grounds as claim 4. Claim 17 is a medium claim that corresponds to system claim 4.

Claims 6, 13, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Desjardins et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2019/0236482, hereinafter Desjardins) and Zoph et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2019/0251439, hereinafter Zoph) in view of Chilimbi et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2016/0092765, hereinafter Chilimbi) and Rabinowitz et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0337464, hereinafter Rabinowitz).
	
Regarding claim 6, the rejection of claim 1 is incorporated. While the cited references in combination teach the claim limitation “wherein the training module is further configured to retrain the task specific neural network” as shown above, they do not explicitly teach: “by fixing one of parameters in the s that at least one candidate choice identifies….” Chilimbi discloses the claim limitation, teaching: that a distributed processing system (DPS) can generate a desired deep neural network model (DNN) by performing forward and backward computation in relation to weight parameters for corresponding neuron layer(s) parameters in the DNN (Chilimbi [0067]-[0068] and [0072]). Wherein “correction factors” are computed for the weight parameters that can be used to update the weights via backpropagation (Chilimbi [0073]-[0074]). The DPS being able to generate a plurality of DNNs for operation in various replica units (Chilimbi [0082]).  
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the process of training the neural network for sequential tasks along with the task specific layers of the neural network in the combined cited references to include the parameters in Chilimbi. A motivation to combine the cited references with Chilimbi 
While the cited references in combination teach the above limitations of claim 6, they do not explicitly teach: “as one of the parameter that is reused from the neural before the copy of the neural network is generated”. Rabinowitz discloses the claim limitations, teaching: that the parameters can be reused in the neural network, wherein such parameters can “be integrated at each layer of the current model” (Rabinowitz [0030]). And that “each neural network may be copied before fine-tuning to explicitly remember all previous tasks”, wherein the elements being copied include “learnt neural network parameters corresponding to previous tasks” (Rabinowitz [0028]). Wherein the parameters comprise a parameter value (Rabinowitz [0052] and [0055]).
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the process of training the neural network for sequential tasks along with the task specific layers of the neural network in the combined cited references to include the copying in Rabinowitz. Doing so would enable a “progressive neural network system 100 [that] may learn multiple machine learning tasks in sequence, where[in] task features are preserved so that new tasks can benefit from all previously learned features and so that the final neural network system can be evaluated on each machine learning task” (Rabinowitz [0039]).  

Regarding claim 13, claim 13 is substantially similar to claim 6 and therefore is rejected on the same grounds as claim 6. Claim 13 is a method claim that corresponds to system claim 6.

Regarding claim 19, claim 19 is substantially similar to claim 6 and therefore is rejected on the same grounds as claim 6. Claim 19 is a medium claim that corresponds to system claim 6.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure:
Moody, “Prediction Risk and Architecture Selection for Neural Networks”: describing an architectural search selection for a neural network. The search selection comprises a heuristic search technique that first involves “sequential network construction” algorithm is used for sequential training of the neural networks followed by pruning of input variables and weights. From there, the heuristic search can utilize one of two search strategies: parallel or sequential approach. The parallel strategy being based 
Bender et. al., “Understanding and Simplifying One-Shot Architecture Search”, July 2018: describing an optimized technique for neural network architectural search using one-shot learning. The technique seeks to better understand weight sharing in a neural network to optimize the architectural search process. The technique comprises of 4 steps: “(1) Design a search space that allows us to represent a wide variety of architectures using a single one-shot model. (2) Train the one-shot model to make it predictive of the validation accuracies of the architectures. (3) Evaluate candidate architectures on the validation set using the pre-trained one shot model. (4) Re-train the most promising architectures from scratch and evaluate their performance on the test set.”

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SELENE A HAEDI whose telephone number is (571)270-5762.  The examiner can normally be reached on M-F 11 AM - 7 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, OMAR FERNANDEZ RIVAS can be reached on (571)272-2589.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


/S.H./Examiner, Art Unit 2128                                                                                                                                                                                                        
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128