Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-19 are presented for examination.
Information Disclosure Statement
3. The information disclosure statements (IDS) filed 10/25/2019; 05/18/2020; 10/21/2020; 01/15/2021; 01/22/2021; 05/17/2021 are in compliance with the provisions of 37 CFR 1.97 and 1.98. Accordingly, the information disclosure statement is being considered by the examiner
Priority
The following claimed benefit is acknowledged: the instant application 16319040, filed 01/18/2019 claims priority from provisional application 62363652, filed 07/18/2016 and PCT application US2017042542 filed on 07/18/2017. 
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims (1-19) are rejected under 35 U.S.C. 112(b), as being indefinite for failing to particularly point out and distinctly claim the subject matter which applicant regards as the invention.
Claim 1, 4, 9, 10, 11, 12 recites “measure of an importance of the parameter to the machine learning model achieving acceptable performance”. However, the scope of measure of an importance of the parameter to the machine learning model achieving acceptable performance”, “values of parameters that were more important in the machine learning model achieving acceptable performance on the first machine learning task are more strongly constrained to not deviate from the first values than values of parameters that were less important in the machine learning
Claims 2, 3, 5-8, 13-17 are dependent of claim 1, and are likewise indefinite.
Claim 18 is being rejected as the same reason of the claim 1.
Claim 19 is being rejected as the same reason of the claim 1. 
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35
U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims (1-4, 7,9, 10, 11, 12, 15, 17, 18 and 19) are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Aslan et al. (Pub. No. US20170132528– hereinafter, Aslan).
Regarding to claim 1, Aslan teaches a computer-implemented method of training a machine learning model having a plurality of parameters, wherein the machine learning model has been trained on a first machine learning task to determine first values of the parameters of the machine learning model (Aslan, ,
and wherein the method comprises: determining, for each of the plurality of parameters, a respective measure of an importance of the parameter to the machine learning model achieving acceptable performance on the first machine learning task (Aslan, [Par.0122, lines 6-10], “optimizing the objective function (e.g., by determining parameter values, such as weight parameter values, for the set of machine learning models that optimizes ( e.g., minimizes) the objective function) to train the first machine learning model.” Examiner’s note, the parameter value is determined during the first machine learning model to optimize the objective function. Therefore, the parameter value is determined to achieve the acceptable performance on the first machine learning task.);
obtaining training data for training the machine learning model on a second, different machine learning task (Aslan, [Par.0023, lines 1-5], “FIG. 1 further illustrates that training data 104 can be used to train at least one of the machine learning models 100 and/or 102. FIG. 1 shows that both machine learning models 100 ;
and training the machine learning model on the second machine learning task by training the machine learning model on the training data to adjust the first values of the parameters so that the machine learning model achieves an acceptable level of performance on the second machine learning task while maintaining an acceptable level of performance on the first machine learning task (Aslan, [Par.0029], “in FIG. 1 by the path 108 going from the training data 104 to the second model 102, and from the second model 102 to the first model 100. In this scenario, the first (teacher) model 100 can "see" what the second (student) model 102 is learning while the second model 102 trains, and/or before the second model 102 completes its training. This can allow the first (teacher) model 100 to adapt what it learns to better match what the second (student) model 102 is learning or is capable of learning.” And [Par.0007, lines 3-12], “The objective function can include at least one term that is a function of: (i) a first output of a first machine learning model and (ii) a second output of a second machine learning model. The process can further include optimizing the objective function to train the first machine learning model and the second machine learning model in parallel. In some implementations, optimizing the objective function includes determining values of model parameters, such as weight parameters, that optimize the objective function.”, Examiner’s note, the first model is ,
wherein, during the training of the machine learning model on the second machine learning task, values of parameters that were more important in the machine learning model achieving acceptable performance on the first machine learning task are more strongly constrained to not deviate from the first values than values of parameters that were less important in the machine learning model achieving acceptable performance on the first machine learning task (Aslan, [Par.0029], “in FIG. 1 by the path 108 going from the training data 104 to the second model 102, and from the second model 102 to the first model 100. In this scenario, the first (teacher) model 100 can "see" what the second (student) model 102 is learning while the second model 102 trains, and/or before the second model 102 completes its training. This can allow the first (teacher) model 100 to adapt what it learns to better match what the second (student) model 102 is learning or is capable of learning.” And [Par.0007, lines 3-12], “The objective function can include at least one term that is a function of: (i) a first output of a first machine learning model and (ii) a second output of a second machine learning model. The process can further include optimizing the objective function to train the first machine learning model and the second machine learning model in parallel. In some implementations, optimizing the objective function includes determining values of model parameters, such as weight parameters, that optimize the objective function.”, Examiner’s note, the first model is learn to adapt to better match what the second model is capable of learning. Therefore, the parameter is modified to adapt the acceptable performance level. The method of this machine .
Regrading to claim 2, Aslan teaches the method of claim 1, wherein the first machine learning task and the second machine learning task are different supervised learning tasks (Aslan, [Par.0024, lines 9-14], “However, the training data 104 may be unlabeled in some implementations, such that the machine learning models 100 and/or 102 can be trained using any suitable learning technique, such as supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning, and so on.”  And [Par.0025, lines 3-7], “The task learned by the first model 100 can be the same task as the task learned by the second model 102, or each model 100 and 102 can learn related (or complimentary) tasks, meaning that the tasks can differ slightly between the models 100 and 102” Examiner’s note, the different machine learning model learn a different tasks, the training using a leaning technique such as supervised learning, reinforcement learning. Therefore, the first and second machine learning tasks are different.).
Regrading to claim 3, Aslan teaches the method of claim 1, wherein the first machine learning task and the second machine learning tasks are different reinforcement learning tasks (Aslan, [Par.0024, lines 9-14], “However, the training data 104 may be unlabeled in some implementations, such that the machine learning models 100 and/or 102 can be trained using any suitable learning technique, such as supervised learning, unsupervised learning, semi-supervised learning, reinforcement .
Regarding to claim 4, Aslan teaches the method of claim 1 , wherein training the machine learning model on the training data comprises: adjusting the first values of the parameters to optimize an objective function that includes: (i) a first term that measures a performance of the machine learning model on the second machine learning task (Aslan, [Par.0029], “in FIG. 1 by the path 108 going from the training data 104 to the second model 102, and from the second model 102 to the first model 100. In this scenario, the first (teacher) model 100 can "see" what the second (student) model 102 is learning while the second model 102 trains, and/or before the second model 102 completes its training. This can allow the first (teacher) model 100 to adapt what it learns to better match what the second (student) model 102 is learning or is capable of learning.” And [Par.0007, lines 3-12], “The objective function can include at least one term that is a function of: (i) a first output of a first machine learning model and (ii) a second output of a second machine learning model. The process can further include optimizing the objective function to train the first machine learning model and the second machine learning model in parallel. In some implementations, optimizing the objective function includes determining values of model parameters, such as weight ,
and (ii) a second term that imposes a penalty for parameter values deviating from the first parameter values, wherein the second term penalizes deviations from the first values more for parameters that were more important in achieving acceptable performance on the first machine learning task than for parameters were less important in achieving acceptable performance on the first machine learning task (Aslan, [Par.0029, lines 17-24], “Accordingly, the first (teacher) model 100 can be biased toward using the learning function that is "good" for the second (student) model 102. The biasing of the first model 100 toward something that is beneficial for the second model 102 can be implemented via a penalty (or distance) term in the objective function that causes the first model 100 to agree with the second model 100 as opposed to disagreeing with the second model 100. This will be discussed in more detail below.”).
Regarding to claim 7, Aslan teaches the method of claim 1, wherein determining, for each of the plurality of parameters, a respective measure of an importance of the parameter to the machine learning model achieving acceptable performance on the first machine learning task comprises: determining, for each of the plurality of parameters, an approximation of a probability that a current value of the parameter is a correct value of the parameter given first training data used to train the machine learning model on the first task (Aslan, [Par.0031, lines .
Regarding to the claim 9, Aslan teaches the method of claim 1, further comprising: after training the machine learning model on the second machine learning task to determine second values of the parameters of the machine learning model: obtaining third training data for training the machine learning model on a third, different machine learning task (Aslan, [Par.0056, lines 1-6], “It is to be appreciated that in any of the joint training examples described herein, the plurality of machine learning models in a set of machine learning models can be trained in parallel, or, alternatively, individual pairings of machine learning models can be jointly ;
and training the machine learning model on the third machine learning task by training the machine learning model on the third training data to adjust the second values of the parameters so that the machine learning model achieves an acceptable level of performance on the third machine learning task while maintaining an acceptable level of performance on the first machine learning task and the second machine learning task wherein (Aslan, [Par.0029], “in FIG. 1 by the path 108 going from the training data 104 to the second model 102, and from the second model 102 to the first model 100. In this scenario, the first (teacher) model 100 can "see" what the second (student) model 102 is learning while the second model 102 trains, and/or before the second model 102 completes its training. This can allow the first (teacher) model 100 to adapt what it learns to better match what the second (student) model 102 is learning or is capable of learning.” And [Par.0007, lines 3-12], “The objective function can include at least one term that is a function of: (i) a first output of a first machine learning model and (ii) a second output of a second machine learning model. The process can further include optimizing the objective function to train the first machine learning model and the second machine learning model in parallel. In some implementations, optimizing the objective function includes determining values of model parameters, such as weight parameters, that optimize the objective function.”, Examiner’s note, the first model is learn to adapt to better match what the next model is capable of learning. Therefore, the model parameter is modified to adapt a accept level of performance.), 
during the training of the machine learning model on the third machine learning task, values of parameters that were more important in the machine learning model achieving acceptable performance on the first machine learning task and the second machine learning task are more strongly constrained to not deviate from the second values than values of parameters that were less important in the machine learning model achieving acceptable performance on the first machine learning task and the second machine learning task (Aslan, [Par.0029], “in FIG. 1 by the path 108 going from the training data 104 to the second model 102, and from the second model 102 to the first model 100. In this scenario, the first (teacher) model 100 can "see" what the second (student) model 102 is learning while the second model 102 trains, and/or before the second model 102 completes its training. This can allow the first (teacher) model 100 to adapt what it learns to better match what the second (student) model 102 is learning or is capable of learning.” And [Par.0007, lines 3-12], “The objective function can include at least one term that is a function of: (i) a first output of a first machine learning model and (ii) a second output of a second machine learning model. The process can further include optimizing the objective function to train the first machine learning model and the second machine learning model in parallel. In some implementations, optimizing the objective function includes determining values of model parameters, such as weight parameters, that optimize the objective function.”, Examiner’s note, the previous model is learn to adapt to better match what the next model is capable of learning. Therefore, the model parameter is modified to adapt a accept level of performance. The method of this machine learning is to train on multiple Machine learning model to optimize the objective function and improve the accuracy, the parameter values is more accurate is more importance in machine learning value. Aslan also disclose all the machine leaning model are trained on one after the other, each machine learning is trained on respective task. The method of this machine learning is to train on multiple Machine learning model to optimize the objective function and improve the accuracy, the parameter values is .).
Regarding to claim 10, Aslan teaches the method of claim 9, further comprising: determining, for each of the plurality of parameters, a respective measure of an importance of the parameter to the machine learning model achieving acceptable performance on the second machine learning task (Aslan, [Par.0007, lines 10-12], “In some implementations, optimizing the objective function includes determining values of model parameters, such as weight parameters, that optimize the objective function.” And [Par.0031, lines 28-37], “In this manner, the penalty term of the objective function can quantifiably measure the agreement/ disagreement between the probabilities of the two models 100 and 102, and works by penalizing the optimization problem when the probabilities disagree, which acts to push the two models 100 and 102 toward agreement with each other. In some implementations, the objective function is designed to push one model toward the other ( e.g., pushing the second model 102 to agree with the first model 100, or vice versa).” Examiner’s note, after each training, the model parameter is measured based on the performance of the machine learning task in order to optimize the objective function. Therefore, on the second machine learning task, the model parameter value is measured based on the performance level.);
and wherein training the machine learning model on the third training data includes adjusting the second values of the parameters to optimize an objective function that includes: (i) a first term that measures a performance of the machine learning model on the third machine learning task (Aslan, [Par.0006, lines -18], “The ,
and (ii) a second term that imposes a penalty for parameter values deviating from the first parameter values, wherein the second term penalizes deviations from the first values more for parameters that were more important in achieving acceptable performance on the first machine learning task than for parameters were less important in achieving acceptable performance on the first machine learning task (Aslan, [Par.00031, lines 22-34], “The objective function used for joint training of the models 100 and 102 can include a penalty term (sometimes called a "distance term") that optimizes the objective function when the probabilities that are output by the first model 100 are similar to, or the same as, the probabilities output by the second model 102. In this manner, the penalty term of the objective function can quantifiably measure the agreement/ disagreement between the probabilities of the two models 100 and 102, and works by penalizing the optimization problem when the probabilities disagree, which acts to push the two models 100 and 102 toward agreement with each other. In some implementations, the objective function is designed to push one model toward the other ( e.g., pushing the second model 102 to agree with the first model 100, or vice versa).”).
(iii) a third term that imposes a penalty for parameter values deviating from the second parameter values, wherein the third term penalizes deviations from the second values more for parameters that were more important in achieving acceptable performance on the second machine learning task than for parameters were less important in achieving acceptable performance on the second machine learning task (Aslan, [Par.0049], “Here, the N teacher models 200 are indexed by { te,} ,􀁩i N_ Additionally, <J)Cte;) comprises an output matrix used in the classification term of the teacher model te, in the objective function (2).1.jJ(te;) comprises an output matrix used in the penalty term (or distance term) for the teacher model te, in the objective function (2). Using the variable modification in Equations (6) in the objective function (2) allows for determining values of model parameters of the .
Regarding to claim 12, Aslan teaches the method of claim 10, wherein the third term depends on, for each of the plurality of parameters (Aslan, [Par.0007, lines 8-12], “The process can further include optimizing the objective function to train the first machine learning model and the second machine learning model in parallel. In some implementations, optimizing the objective function includes determining values of model parameters, such as weight parameters, that optimize the objective function.” Examiner’s note, the parameter values is measured based on performance level, the weight parameters are considered as plurality of the parameter), 
 a product of (i) the respective measure of importance of the parameter to the machine learning model achieving acceptable performance on the second machine learning task, and (ii) a difference between the current value of the parameter and the second value of the parameter (Aslan, [Par.0008, lines 1-8], “The joint model training techniques described herein provide greater flexibility as compared to current model training methods due to the ability of at least one model to influence the training of at least one other model during the joint training process. In this sense, a machine learning model is able to see what another machine learning model is learning, as the other machine learning model is learning. Furthermore, multiple machine learning models can be trained in a collaborative fashion where visibility across models is enabled, which can lead to one machine learning model selecting a learning function ,
 Regarding to claim 15, Aslan teaches the method of claim 1, the method further comprising providing the trained machine learning model for use in processing data after training the machine learning model on the second machine learning task (Aslan, [Par.0056, lines 1-7], “It is to be appreciated that in any of the joint training examples described herein, the plurality of machine learning models in a set of machine learning models can be trained in parallel, or, alternatively, individual pairings of machine learning models can be jointly trained in parallel, one after the other, until all of the machine learning models in a set are trained.” Examiner’s note, Aslan disclosed the machine learning model is trained one after the other. ) .
Regarding to claim 17, Aslan teaches the method of claim 1wherein the first and second machine learning tasks each comprise a classification task, and wherein the classification task is processing data to classify the data (Aslan, [Par.0077, lines 1-12], “A computer-implemented method comprising: providing a set of machine learning models that are to learn a respective task ( e.g., a classification task, such as a binary classification task, a multi-label classification task, or a task that infers a set of probabilities based on unknown input data, etc.), the set of machine learning .
Regarding to claim 18, as being rejected as the same reason as the claim 1.
Additionally, Aslan further disclosed a system comprising one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform the operations (Aslan, [Par.0086], “A system comprising: one or more processors ( e.g., central processing units (CPUs), field programmable gate array (FPGAs ), complex programmable logic devices (CPLDs), application specific integrated circuits (ASICs), system-on-chips (SoCs), etc.); and memory (e.g., RAM, ROM, EEPROM, flash memory, etc.) storing computer executable instructions that, when executed by the one or more processors, cause performance of operations comprising: providing a set of machine learning models that are to learn a respective task…”),
Regarding to claim 19, as being rejected as the same reason as the claim 1.
Additionally. Aslan disclosed a computer storage medium encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform the operations (Aslan, [Par.0057, lines 1-8], “The processes described herein are illustrated as a collection of blocks in a logical flow graph, which represent a sequence of operations that can be implemented in hardware, software, or a combination thereof. In the context of software, the blocks represent ,
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 5, 6, 8, 13, 14 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Aslan et al. (Pub. No. US20170132528– hereinafter, Aslan) in view of Sinyavskiy et al. (Patent No.: US9146546-hereinafter, Sinyayskiy et al.).
Regarding to claim 5, Aslan teaches the method of claim 4, wherein training the machine learning model on the training data comprises, for each training example in the training data: processing the training example using the machine learning model in accordance with current values of parameters of the machine learning model to determine a model output (Aslan, [Par.0006, lines 14-18], “The passing of information can be accomplished via the formulation, and optimization, of an objective function that comprises model parameters that are based on the multiple machine learning models in the set. “ and [Par.0043, lines 11-18], “Furthermore, the scheduling module can be configured to control the degree to which any given machine learning model can influence another. For example, an allocation between the use of training data and machine learning model output can be specified for a given model's training (e.g., 90% training from training data 104, and 10% training from the output of another machine learning model).” Examiner’s note, the model parameter is trained based on the machine learning model, therefore, the machine learning model using a respective or current parameter values to determine an output.); 
However, Aslan does not teach determining a gradient of the objective function using the model output, a target output for the training example, the current values of the parameters of the machine learning model, and the first values of the parameters of the machine learning model; and adjusting the current values of the parameters using the gradient to optimize the objective function.
On the other hand, Sinyavskiy teaches determining a gradient of the objective function using the model output, a target output for the training example, the current values of the parameters of the machine learning model, and the first values of the parameters of the machine learning model (Sinyavskiy, [Column 3, lines 20-26], “Some existing learning rules for the supervised learning may rely on the gradient of the performance function. The gradient for reinforcement learning part may be implemented through the use of the adaptive critic; the gradient for supervised learning may be implemented by taking a difference between the supervisor signal and the actual output of the controller… Additional analytic derivation of the learning rules may be needed when the loss function between supervised and actual output signal is redefined.” And [Column 3, lines 49-54], “Some of the existing approaches of taking a derivative of a performance function without analytic calculations may include a "brute force" finite difference estimator of the gradient. However, these estimators may be impractical for use with large spiking networks comprising many (typically in excess of hundreds) parameters.” Examiner’s note, the gradient of the performance function (gradient of the objective function) is measured based on the supervisor signal, the actual output, and the parameter value. Furthermore, see at [column 4, lines 37-39], “One common approach is to describe the task in terms of optimization of some function and then use gradient approaches in the parameter space of the spiking neuron. ” Examiner’s note, the gradient is determined for each learning task, therefore, the first parameter value of the machine learning model is being used in the training.);
and adjusting the current values of the parameters using the gradient to optimize the objective function (Sinyavskiy, [Column 16, lines 29-38], “parameters including connection efficacy, firing threshold, resting potential of the neuron, and/or other parameters. The analytical relationship ofEqn.1 may be selected such that the gradient of ln [p(ylx,w)] with respect to the system parameter w exists and can be calculated. The framework shown in FIG. 3 may be configured to estimate rules for changing the system parameters ( e.g., learning rules) so that the performance function F(x,y,r) is minimized for the current set of inputs and outputs and system dynamics S.” Furthermore, see [Column 35, lines 34-39], “At step 816 learning parameter w update may be determined by the Parameter Adjustment block ( e.g., block 426bof FIG. 4) using the performance function F and the gradient g, determined at steps 812, 814 respectively. In some implementations, the learning parameter update may be implemented according to Eqns. 22-31.” Examiner’s note, during the training, the specific parameter (current parameter) is being updated. That is well known in art, the parameter is being updated to optimize the objective function (improve the performance behavior).).
Aslan and Sinyavskiy are analogous in arts because they have the same filed of endeavor of using a machine learning to train on the multiple machine learning models respective of the machine learning tasks.
Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified Aslan’s method, further in view of Sinyavskiy by using the model output to determine a gradient of the objective function. The modification would have been obvious because 
Regarding to claim 6, Aslan, as modified in view of Sinyavskiy teaches the method of claim 4, wherein the second term depends on, for each of the plurality of parameters (Aslan, [Par.0007, lines 8-12], “The process can further include optimizing the objective function to train the first machine learning model and the second machine learning model in parallel. In some implementations, optimizing the objective function includes determining values of model parameters, such as weight parameters, that optimize the objective function.” Examiner’s note, the parameter values is measured based on performance level. the weight parameters are considered as plurality of the parameter),
a product of the respective measure of importance of the parameter and a difference between the current value of the parameter and the first value of the parameter (Aslan, [Par.0008, lines 1-8], “The joint model training techniques described .
Regarding to claim 11, is being rejected as the same reason as the claim 6.
Regarding to claim 8, Aslan, as modified in view of Sinyavskiy teaches the method of claim 1 any one of claims 17, wherein determining, for each of the plurality of parameters, a respective measure of an importance of the parameter to the machine learning model achieving acceptable performance on the first machine learning task comprises: determining a Fisher Information Matrix (FIM) of the plurality of parameters of the machine learning model with respect to the first machine learning task, wherein, for each of the plurality of parameters, the respective measure of the importance of the parameter is a corresponding value on a diagonal of the FIM ( Sinyavskiy, [Column 28-29, lines 63-33], “ In some implementations, the gradient signal g, determined
by the PD block 422vof FIG. 4 may be subsequently modified according to another 

    PNG
    media_image1.png
    629
    578
    media_image1.png
    Greyscale

Examiner, training on the machine learning task including the gradient signal, the fisher information matrix, wherein calculation of the fisher information matrix including the parameter, therefore, the measure of parameter value is corresponding to the FIM.).
Aslan and Sinyavskiy are analogous in arts because they have the same filed of endeavor of using a machine learning to train on the multiple machine learning models respective of the machine learning tasks.
Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified Aslan’s method, further in view of Sinyavskiy by using FIM value to determine the importance of the parameter. The modification would have been obvious because one of the ordinary skills in art would be motivated to improve the accuracy of classifying and flexible of training by different learning rules (Sinyavskiy, [Abstract], “frame­work 
Regarding to claim 13, Aslan, as modified in view of Sinyavskiy teaches the method of claim 4 when dependent upon claim 4, further comprising identifying when switching from one machine learning task to another and updating the second term of the objective function in response (Sinyavskiy, [Column 3, lines 44-48], “Moreover, analytic determination of a performance function F derivative may require additional operations ( often performed manually) for individual new formulated tasks that are not suitable for dynamic switching and reconfiguration of the tasks described before” furthermore, see [column 25, lines 52-57], “The PD block 475 output may be be determined based the output signal 418,  the learning signals 476 comprising the reinforcement component r(t) and the desired output (teaching) component yd(t) and on the input signal 412, that determines the context for switching between supervised and reinforcement task function” Examiner’s note, when the difference machine learning task is being trained then the objective function will be .
Aslan and Sinyavskiy are analogous in arts because they have the same filed of endeavor of using a machine learning to train on the multiple machine learning models respective of the machine learning tasks.
Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified Aslan’s method, further in view of Sinyavskiy training the multiple machine learning tasks. The modification would have been obvious because one of the ordinary skills in art would be motivated to improve the accuracy of classifying and flexible of training by different learning rules (Sinyavskiy, [Abstract], “frame­work may be used to enable adaptive spiking neuron signal processing system to flexibly combine different learning rules (supervised, unsupervised, reinforcement learning) with different methods (online or batch learning). The gener­alized learning framework may employ time-averaged per­formance function as the learning measure thereby enabling modular architecture where learning tasks are separated from control tasks, so that changes in one of the modules do not necessitate changes within the other. Separation of learning tasks from the control tasks implementations may allow dynamic reconfiguration of the learning block in response to a task change or learning method change in real time.”).

Regarding to claim 16, Aslan, as modified in view of Sinyavskiy teaches the method of claim 1, wherein the first and second machine learning tasks each comprise a reinforcement learning task, and wherein the reinforcement learning task is controlling an agent to interact with an environment to achieve a goal (Sinyavskiy, [Column 44, lines 55-63], “One or more implementations of reinforcement learning may require solving adaptive control task (e.g., AUV/UAV navigation) without having detailed prior information about the dynamics of the controlled plant ( e.g., the plant 514 in FIG. 5 The reinforcement signal ( e.g., the signal 504 5 is typically used to specify to the adaptive controller ( e.g., the controller 520 of FIG. 5) whether prior behavior led to "desired" or "undesired" results.”  And [Column 45-46, line 65 -3], “Even when existing learning approaches employ neural networks as the computational engine, each learning task is typically performed by a separate network ( or network partition) that operate task specific (e.g., adaptive control, classification, recognition, prediction rules, etc.) set of learning rules (e.g.,supervised, unsupervised, reinforcement). Examiner’s note, therefore, machine learning is trained on multiple machine learning tasks, and the reinforcement learning is controlling a classification behavior in order to reach a desired result.).
Aslan and Sinyavskiy are analogous in arts because they have the same filed of endeavor of using a machine learning to train on the multiple machine learning models respective of the machine learning tasks.
Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified Aslan’s method, further in view of Sinyavskiy training the multiple machine learning tasks. The modification would have been obvious because one of the ordinary skills in art would be motivated to improve the accuracy of classifying and flexible of training by different learning rules (Sinyavskiy, [Abstract], “frame­work may be used to enable .
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure is provide below.
Doya et al. (Multiple Model Based Reinforcement Learning- Human Information Science Laboratiories, ART International 2-2-2 Hikaridai, Seiku, Kyoto 619-0288, Japan-hereinafter- Doya et al.) teaches using the reinforcement learning to train multiple models. 
Ruvolo et al. (ELLA: An Efficient Lifelong Learning Algorithm- Bryn Mawr College, computer science Department, 101 North Merion Avenue, Bryn Mawr, PA 19010 USA- hereinafter-Ruvolo et al.) teaches training a machine learning models on the multiple learning tasks.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EM N TRIEU whose telephone number is (571)272-5747.  The examiner can normally be reached on 7:30 - 5:00 M_TH

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on 571 272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/E.T./
Examiner, Art Unit 2126

/BABOUCARR FAAL/Primary Examiner, Art Unit 2184