DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is responsive to the original application filed on 10/31/2017 and the Remarks and Amendments filed on 3/5/2021.  

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.




1-4, 6-12, 15, and 17-20 are rejected under 35 U.S.C. § 103 as being obvious over Ding et al. (US 20180276533 A1, hereinafter “Ding”) in view of Romera-Paredes (Romera-Paredes et al., “Exploiting Unrelated Tasks in Multi-Task Learning”, 2012, n Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS) 2012, La Palma, Canary Islands, pp. 951-959, hereinafter “Romera”).

Regarding claim 1, Ding discloses [a] feature ranking neural network, comprising ([0001]; “ranking feeds based on prediction of interactions between viewing users and the feeds using multi-task neutral networks”, which discloses a feature ranking neural network in a multitask learning context)
an input layer; (Figure 4A; the figure discloses an input layer in the form of the data received as a feature vector 410 into the shared layers 420.  Under a broadest reasonable interpretation of the claim language, the input layer is the data itself that is received into the neural network that then performs layer-wise processing)
a broadcast layer encoding a plurality of weights, wherein the plurality of weights comprise a task specific weight for each task of a plurality of tasks for which the neural network is trained; (Figure 4A, 410 and 420; the figure discloses the broadcast or shared layers 420 that take feature vector 410 that encode a plurality of weights that comprise a task specific weight for each task of a plurality of tasks for which the NN is trained; and [0040]; “In some embodiments, during the multi-task learning, for a specific task, the model module 330 trains shared layers and a separate layer associated with the specific task included in the prediction model, using the training set by weighting the various features in each feature vectors, such that features that are more relevant to one or more specific tasks performed by the viewing user tend to have higher weight than features that are less relevant to the one or more specific tasks”, which further discloses the task specific weights for each of the plurality of tasks for which the NN is trained, and this is done using the prediction model; and [0044]; “A feature vector 410 associated with the content item 110 is generated. The feature vector 410 incudes features associated with characteristics of the poster Lucy Hall (e.g., information included in Lucy Hall's user profile, Lucy Hall's current location), features associated with characteristics of the content item 110, features associated with characteristics of the viewing user (e.g., information included in the viewing user's user profile and the viewing user's location), and features associated with relationships among the poser, the viewing user and the content item 110”)
two or more separate and parallel branches of the feature ranking neural network, wherein the branches are configured to receive a set of filtered inputs from the broadcast layer based on the weights encoded by the broadcast layer, wherein each branch corresponds to a different task; and . . . the two or more separate and parallel branches (Figure 4A, 430A, 430B, 430C; the figure discloses the two or more separate and parallel branches of the NN that are configured to receive a set of filtered inputs from the broadcast or shared layers 420 and 410 based on weights encoded; and [0040]; “In some embodiments, during the multi-task learning, for a specific task, the model module 330 trains shared layers and a separate layer associated with the specific task included in the prediction model, using the training set by weighting the various features in each feature vectors, such that features that are more relevant to one or more specific tasks performed by the viewing user tend to have higher weight than features that are less relevant to the one or more specific tasks”); and [0043]; “the prediction model 400A includes one or more additional separate layers associated with other suitable tasks”; and [0044]; “The common features outputted from the shared layers 410 are inputs to a separate layer (e.g., 430A, 430B, or 430C) associated with a task”)
an output layer downstream from the branches, wherein the output layer is configured to provide an output of the feature ranking neural network (Figure 4A, 440, 450, and 460; the figure discloses at least one output layer for each task; and [0044]; “The separate layer outputs a likelihood score indicating how likely the viewing user will perform a corresponding task associated with the content item 110”).
Ding fails to explicitly disclose wherein an orthogonalization technique is applied to the plurality of weights to identify at least one feature shared between the two or more separate and parallel branches.
Romera discloses wherein an orthogonalization technique is applied to the plurality of weights to identify at least one feature shared between the two or more separate and parallel branches (Abstract; “We propose a novel method which builds on a prior multitask methodology by favoring a shared low dimensional representation within each group of tasks. In addition, we impose a penalty on tasks from different groups which encourages the two representations to be orthogonal”, which discloses the orthogonalization technique; and Page 954, Algorithm 1; the algorithm discloses, under a broadest reasonable interpretation of the claim language, applying an orthogonalization or reguarization technique to a plurality of weights to identify at least one feature shared between two separate and parallel branches; and Page 957, §5; “We have proposed a regularization formulation which incorporates this information in the learning method. The regularizer encourages both a low dimensional representation and penalizes the inner product between any pair of weight vectors of tasks from different groups. The implication of this constraint is that we look for common sparse representations within each group of tasks and also that tasks from different groups share as few features as possible”, which discloses applying the orthogonalization technique to a plurality of weights (in the form of weight vectors) to identify at least one shared feature (tasks that share as few features as possible) between the two or more separate and parallel branches; and see Page 952, §3 for a further discussion of the orthogonalization technique used in Romera).
Ding and Romera are analogous art because both are concerned with multi-task learning.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in multi-task learning to combine the orthogonalization technique of Romera with the separate and parallel branches and method of Ding to yield the predictable result of wherein an orthogonalization technique is applied to the plurality of weights to identify at least one feature shared between the two or more separate and parallel branches. The motivation for doing so would be to encourage both a low dimensional representation and penalize the inner product between any pair of weight vectors of tasks from different groups to provide for better generalization in the context of multi-task learning (Romera; Page 957, §5, Discussion).

2, the rejection of claim 1 is incorporated and Ding further discloses wherein the weights of the broadcast layer filter input features provided to the input layer in a task-specific manner to the branches of the feature ranking neural network (Figure 4A, 410 and 420; the figure discloses the broadcast or shared layers 420 that take feature vector 410 that filter input features provided to the input layer in a task-specific manner; and [0040]; “In some embodiments, during the multi-task learning, for a specific task, the model module 330 trains shared layers and a separate layer associated with the specific task included in the prediction model, using the training set by weighting the various features in each feature vectors, such that features that are more relevant to one or more specific tasks performed by the viewing user tend to have higher weight than features that are less relevant to the one or more specific tasks”, which further discloses the task specific weights for each of the plurality of tasks for which the NN is trained, and this is done using the prediction model; and [0044]; “A feature vector 410 associated with the content item 110 is generated. The feature vector 410 incudes features associated with characteristics of the poster Lucy Hall (e.g., information included in Lucy Hall's user profile, Lucy Hall's current location), features associated with characteristics of the content item 110, features associated with characteristics of the viewing user (e.g., information included in the viewing user's user profile and the viewing user's location), and features associated with relationships among the poser, the viewing user and the content item 110”).

Regarding claim 3, the rejection of claim 1 is incorporated and Ding further discloses wherein the plurality of weights further includes a common weight attributable to each task for which the neural network is trained ([0040]; “In some embodiments, during the multi-task learning, for a specific task, the model module 330 trains shared layers and a separate layer associated with the specific task included in the prediction model, using the training set by weighting the various features in each feature vectors, such that features that are more relevant to one or more specific tasks performed by the viewing user tend to have higher weight than features that are less relevant to the one or more specific tasks”, which discloses a common weights attributable to each task for which the NN is trained).

Regarding claim 4, the rejection of claim 1 is incorporated and Ding further discloses wherein each weight of the plurality of weights is approximately zero or approximately one ([0040]; “In some embodiments, during the multi-task learning, for a specific task, the model module 330 trains shared layers and a separate layer associated with the specific task included in the prediction model, using the training set by weighting the various features in each feature vectors, such that features that are more relevant to one or more specific tasks performed by the viewing user tend to have higher weight than features that are less relevant to the one or more specific tasks”, which discloses a the weights is approximately one (higher weight, under a broadest reasonable interpretation of the claim language), or approximately zero (a lesser weight).

Regarding claim 6, the rejection of claim 1 is incorporated and Ding further discloses wherein the plurality of weights of the broadcast layer is learned by training the feature ranking neural network ([0040]; “In some embodiments, during the multi-task learning, for a specific task, the model module 330 trains shared layers and a separate layer associated with the specific task included in the prediction model, using the training set by weighting the various features in each feature vectors, such that features that are more relevant to one or more specific tasks performed by the viewing user tend to have higher weight than features that are less relevant to the one or more specific tasks”, which discloses wherein the plurality of weights of the broadcast layer is learned (by multi-task learning) by training the feature ranking neural network).

Regarding claim 7, Ding discloses [a] feature ranking method, comprising: ([0001]; “ranking feeds based on prediction of interactions between viewing users and the feeds using multi-task neutral networks”, which discloses a feature ranking method in a multitask learning context)
providing as a training data set a plurality of input samples to an input layer of a neural network, (Figure 4A; the figure discloses an input layer in the form of the data received as a feature vector 410 into the shared layers 420.  Under a broadest reasonable interpretation of the claim language, the input layer is the data itself that is received into the neural network that then performs layer-wise processing)
wherein each input sample is characterized by one or more features, (Figure 4A, 410)
wherein the neural network comprises a plurality of layers, and (Figure 4A, 420, 430A, 430B, 430C)
wherein one of the plurality of layers is a broadcast layer comprising a respective weight for each task of a plurality of tasks; ((Figure 4A, 410 and 420; the figure discloses the broadcast or shared layers 420 that take feature vector 410 that encode a plurality of weights that comprise a task specific weight for each task of a plurality of tasks for which the NN is trained; and [0040]; “In some embodiments, during the multi-task learning, for a specific task, the model module 330 trains shared layers and a separate layer associated with the specific task included in the prediction model, using the training set by weighting the various features in each feature vectors, such that features that are more relevant to one or more specific tasks performed by the viewing user tend to have higher weight than features that are less relevant to the one or more specific tasks”, which further discloses the task specific weights for each of the plurality of tasks for which the NN is trained, and this is done using the prediction model; and [0044]; “A feature vector 410 associated with the content item 110 is generated. The feature vector 410 incudes features associated with characteristics of the poster Lucy Hall (e.g., information included in Lucy Hall's user profile, Lucy Hall's current location), features associated with characteristics of the content item 110, features associated with characteristics of the viewing user (e.g., information included in the viewing user's user profile and the viewing user's location), and features associated with relationships among the poser, the viewing user and the content item 110”)
processing the input samples to train the respective weight for each task for each of the one or more features; and ([0040]; “In some embodiments, during the multi-task learning, for a specific task, the model module 330 trains shared layers and a separate layer associated with the specific task included in the prediction model, using the training set by weighting the various features in each feature vectors, such that features that are more relevant to one or more specific tasks performed by the viewing user tend to have higher weight than features that are less relevant to the one or more specific tasks”, which discloses processing the input samples to train the respective weight for each task for each of the one or more features).
for one or more tasks of the plurality of task, identifying one or more features as being relevant to a respective task based on the respective weights of the one or more features with respect to the respective tasks ([0044]; “The separate layer outputs a likelihood score indicating how likely the viewing user will perform a corresponding task associated with the content item 110”, the likelihood score being associated with the identified one or more features as being relevant to a task, as the score is associated with one of the tasks as shown in Figure 4A, 440, 450, and 460.  This is based on the weights discussed in paragraph [0040] of Ding).
Ding fails to explicitly disclose wherein an orthogonalization technique is applied to the respective weights to identify at least one feature shared between at least two of the plurality of layers.
Romera discloses wherein an orthogonalization technique is applied to the respective weights to identify at least one feature shared between at least two of the plurality of layers (Abstract; “We propose a novel method which builds on a prior multitask methodology by favoring a shared low dimensional representation within each group of tasks. In addition, we impose a penalty on tasks from different groups which encourages the two representations to be orthogonal”, which discloses the orthogonalization technique; and Page 954, Algorithm 1; the algorithm discloses, under a broadest reasonable interpretation of the claim language, applying an orthogonalization or regularization technique to a plurality of weights to identify at least one feature shared between two separate and parallel branches; and Page 957, §5; “We have proposed a regularization formulation which incorporates this information in the learning method. The regularizer encourages both a low dimensional representation and penalizes the inner product between any pair of weight vectors of tasks from different groups. The implication of this constraint is that we look for common sparse representations within each group of tasks and also that tasks from different groups share as few features as possible”, which discloses applying the orthogonalization technique to a plurality of weights (in the form of weight vectors) to identify at least one shared feature (tasks that share as few features as possible) between the two or more separate and parallel branches; and see Page 952, §3 for a further discussion of the orthogonalization technique used in Romera).
Ding and Romera are analogous art because both are concerned with multi-task learning.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in multi-task learning to combine the orthogonalization technique of Romera with the plurality of NN layers and method of Ding to yield the predictable result of wherein an orthogonalization technique is applied to the respective weights to identify at least one feature shared between at least two of the plurality of layers. The motivation for doing so would be to encourage both a low dimensional representation and penalize the inner product between any pair of weight vectors of tasks from different groups to provide for better generalization in the context of multi-task learning (Romera; Page 957, §5, Discussion).


Regarding claim 8, the rejection of claim 7 is incorporated and Ding further discloses wherein different tasks are characterized by different sets of identified features ([0040]; “In some embodiments, during the multi-task learning, for a specific task, the model module 330 trains shared layers and a separate layer associated with the specific task included in the prediction model, using the training set by weighting the various features in each feature vectors, such that features that are more relevant to one or more specific tasks performed by the viewing user tend to have higher weight than features that are less relevant to the one or more specific tasks”, which discloses wherein different tasks are characterized by different sets of identified features).

Regarding claim 9, the rejection of claim 7 is incorporated and Ding further discloses wherein processing the input samples to train the respective weight for each task for each of the one or more features further comprises training a common weight shared by the one or more features across the plurality of tasks ([0040]; “In some embodiments, during the multi-task learning, for a specific task, the model module 330 trains shared layers and a separate layer associated with the specific task included in the prediction model, using the training set by weighting the various features in each feature vectors, such that features that are more relevant to one or more specific tasks performed by the viewing user tend to have higher weight than features that are less relevant to the one or more specific tasks”, which discloses wherein processing the input samples to train the respective weight for each task for each of the one or more features further comprises training a common (higher) weight shared by the one or more features across the plurality of tasks).

Regarding claim 10, the rejection of claims 7 and 9 are incorporated and Ding further discloses wherein common weight is provided as part of an output of the feature ranking method ([0044]; “The separate layer outputs a likelihood score indicating how likely the viewing user will perform a corresponding task associated with the content item 110”, the likelihood score being associated with the common weight is provided as part of an output of the feature ranking method).

Regarding claim 11, the rejection of claim 7 is incorporated and Ding further discloses wherein the broadcast layer filters the one or more features in a task-specific manner ([0040]; “In some embodiments, during the multi-task learning, for a specific task, the model module 330 trains shared layers and a separate layer associated with the specific task included in the prediction model, using the training set by weighting the various features in each feature vectors, such that features that are more relevant to one or more specific tasks performed by the viewing user tend to have higher weight than features that are less relevant to the one or more specific tasks”, which discloses wherein the broadcast layer (shared layer that works with the feature vector) filters the one or more features in a task-specific manner; and [0037]; “The feature extractor 310 generates feature vectors for each content item”).

12, the rejection of claim 7 is incorporated and Ding further discloses wherein the broadcast layer filters the one or more features to limit propagation of the one or more features to a plurality of parallel branches of the neural network downstream from the broadcast layer ([0040]; “In some embodiments, during the multi-task learning, for a specific task, the model module 330 trains shared layers and a separate layer associated with the specific task included in the prediction model, using the training set by weighting the various features in each feature vectors, such that features that are more relevant to one or more specific tasks performed by the viewing user tend to have higher weight than features that are less relevant to the one or more specific tasks”, which discloses wherein the broadcast layer filters (or weights) the one or more features to limit propagation of the one or more features to a plurality of parallel branches of the neural network downstream from the broadcast layer (shared layer that works with the feature vector)).

Regarding claim 15, the rejection of claim 7 is incorporated and Ding further discloses based on a user input, imposing a similarity constraint for the respective weights associated with two or more tasks having a similar feature base ([0040]; “In some embodiments, during the multi-task learning, for a specific task, the model module 330 trains shared layers and a separate layer associated with the specific task included in the prediction model, using the training set by weighting the various features in each feature vectors, such that features that are more relevant to one or more specific tasks performed by the viewing user tend to have higher weight than features that are less relevant to the one or more specific tasks”, which discloses based on a user input by virtue of the data that produces the weight coming from user-derived data, imposing a similarity constraint for the respective weights associated with two or more tasks having a similar feature base because the similar features are weighted higher and are thus similar subjected to, under a broadest reasonable interpretation of the claim language, a similarity constraint).

Regarding claim 17, Ding discloses [a] method for generating a reduced feature set model, comprising: ([0001]; “ranking feeds based on prediction of interactions between viewing users and the feeds using multi-task neutral networks”, which discloses a method in a multitask learning context; and [0037]; “The feature extractor 310 generates feature vectors for each content item”, which discloses generating a reduced feature set model)
acquiring one or more weights associated with a broadcast layer of a trained neural network, wherein each weight is associated with a respective feature and task combination; ([0040]; “In some embodiments, during the multi-task learning, for a specific task, the model module 330 trains shared layers and a separate layer associated with the specific task included in the prediction model, using the training set by weighting the various features in each feature vectors, such that features that are more relevant to one or more specific tasks performed by the viewing user tend to have higher weight than features that are less relevant to the one or more specific tasks”, which discloses acquiring one or more weights associated with a broadcast layer of a trained neural network, wherein each weight is associated with a respective feature and task combination)
identifying one or more task-specific features for a given task based on the weights; and ([0044]; “The separate layer outputs a likelihood score indicating how likely the viewing user will perform a corresponding task associated with the content item 110”, the likelihood score being the identified one or more task-specific features for a given task based on the weights)
generating a model based on the one or more task-specific features, wherein the one or more task-specific features is a subset of a larger feature set for which the trained neural network was trained ([0039]; “The model module 330 trains a multi-task neutral network prediction model using the training set from the training set module 320. The training process is referred to a multi-task learning”, and this model is generated and trained based on task-specific features which are determined by the feature extractor in figure 3.  The features from the feature extractor in figure 3 is a subset of a larger feature set for which the trained neural network was trained).
Ding fails to explicitly disclose wherein an orthogonalization technique is applied to the one or more weights to identify at least one feature shared between the two or more layers of the trained neural network.
Romera discloses wherein an orthogonalization technique is applied to the one or more weights to identify at least one feature shared between the two or more layers of the trained neural network (Abstract; “We propose a novel method which builds on a prior multitask methodology by favoring a shared low dimensional representation within each group of tasks. In addition, we impose a penalty on tasks from different groups which encourages the two representations to be orthogonal”, which discloses the orthogonalization technique; and Page 954, Algorithm 1; the algorithm discloses, under a broadest reasonable interpretation of the claim language, applying an orthogonalization or regularization technique to a plurality of weights to identify at least one feature shared between two separate and parallel branches; and Page 957, §5; “We have proposed a regularization formulation which incorporates this information in the learning method. The regularizer encourages both a low dimensional representation and penalizes the inner product between any pair of weight vectors of tasks from different groups. The implication of this constraint is that we look for common sparse representations within each group of tasks and also that tasks from different groups share as few features as possible”, which discloses applying the orthogonalization technique to a plurality of weights (in the form of weight vectors) to identify at least one shared feature (tasks that share as few features as possible) between the two or more separate and parallel branches; and see Page 952, §3 for a further discussion of the orthogonalization technique used in Romera).
The motivation to combine Ding and Romera is the same as discussed above with respect to claim 7.

Regarding claim 18, the rejection of claim 17 is incorporated and Ding further discloses wherein the model comprises a task-specific neural network (Figure 4A; the figure discloses a task-specific neural network).

Regarding claim 19, the rejection of claim 17 is incorporated and Ding further discloses wherein the one or more task specific features are identified based on user inputs in addition to the weights ([0040]; “In some embodiments, during the multi-task learning, for a specific task, the model module 330 trains shared layers and a separate layer associated with the specific task included in the prediction model, using the training set by weighting the various features in each feature vectors, such that features that are more relevant to one or more specific tasks performed by the viewing user tend to have higher weight than features that are less relevant to the one or more specific tasks”, which discloses wherein the one or more task specific features are identified based on user inputs (because the model is trained based on user behavior data) in addition to the weights).

Regarding claim 20, the rejection of claim 17 is incorporated and Ding further discloses using the model to evaluate additional observations related to the task based on observed values of the one or more task-specific features ([0040]; “For a next specific task, the model module 330 selects a corresponding training set to train shared layers and a separate layer associated with the next specific task. The prediction model is updated accordingly”, which discloses using the model to evaluate additional observations related to the task because the model is updated accordingly with new information based on observed values).


Claim 5 is rejected under 35 U.S.C. § 103 as being obvious over Ding in view of Romera and further in view of Riemer et al. (Riemer et al., “A Deep Learning and Knowledge Transfer Based Architecture for Social Media User Characteristic 

Regarding claim 5, the rejection of claim 1 is incorporated but Ding fails to explicitly disclose wherein the two or more branches of the feature ranking neural network each comprise a plurality of hidden layer.
Riemer discloses wherein the two or more branches of the feature ranking neural network each comprise a plurality of hidden layer (Page 43, Figure 1; the figure discloses task specific hidden layers; and Page 45, Figure 2; the figure discloses task specific hidden layers).
Ding, Romera, and Riemer are analogous art because all are concerned with multitask learning.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in multitask learning to combine the hidden layers of Riemer with the neural network of Ding and Romera to yield the predictable result of wherein the two or more branches of the feature ranking neural network each comprise a plurality of hidden layer. The motivation for doing so would be to efficiently transfer knowledge between related Natural Language Processing tasks (Riemer; Abstract).


Claim 13 is rejected under 35 U.S.C. § 103 as being obvious over Ding in view of Romera and further in view of Fan et al. (US 20170209105 A1, hereinafter “Fan”).

13, the rejection of claim 7 is incorporated but Ding fails to explicitly disclose receiving a user input further limiting the one or more features identified as being relevant to the respective task.
Fan discloses receiving a user input further limiting the one or more features identified as being relevant to the respective task ([0063]; “For example, the neural network may take the initial optimization model (i.e., the optimization model described herein above with regard to FIG. 3) and tune the optimization model based on the user feedback. The user feedback regarding the image and all of the relevant parameters used to generate the image (including but not limited to the patient-specific inputs, the system-specific inputs, the clinical task selection, the image quality selection, the optimized dose level, the optimized scan protocol, and so on) may be used as input features for the neural network”, which discloses the user input for tuning the neural network).
Ding, Romera, and Fan are analogous art because all are concerned with machine learning.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in machine learning to combine the user input of Fan with the method of Ding and Romera to yield the predictable result of receiving a user input further limiting the one or more features identified as being relevant to the respective task. The motivation for doing so would be to adjust an optimization model based on input features (Fan; ([0063]).

14 is rejected under 35 U.S.C. § 103 as being obvious over Ding in view of Romera and further in view of Schwartz et al. (US 20180018571 A1, hereinafter “Schwartz”).

Regarding claim 14, the rejection of claim 7 is incorporated but Ding fails to explicitly disclose imposing a sparsity constraint on the respective weights for each task for each of the one or more features.
Schwartz discloses imposing a sparsity constraint on the respective weights for each task for each of the one or more features ([0046]; “Those skilled in the art know that there are a number of ways to accomplish this such as applying a sparsity constraint as in restricted Boltzmann networks, enforcing a minimum viable network weight and reducing all weights to 0 that fail to meet the minimum, or dividing all weights by some constant so that they sum to 1”, which discloses the sparsity constraint imposed upon the weight).
Ding, Romera, and Schwartz are analogous art because all are concerned with machine learning.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in machine learning to combine the sparsity constraint of Schwartz with the method of Ding and Romera to yield the predictable result of imposing a sparsity constraint on the respective weights for each task for each of the one or more features. The motivation for doing so would be to normalize or prune weights (Schwartz; ([0046]).

16 is rejected under 35 U.S.C. § 103 as being obvious over Ding in view of Romera and further in view of Emrani et al. (Emrani et al., “Prognosis and Diagnosis of Parkinson’s Disease Using Multi-Task Learning”, Aug. 17, 2017, KDD 2017 Applied Data Science Paper, pp. 1457-1466, hereinafter “Emrani”).

Regarding claim 16, the rejection of claim 7 is incorporated but Ding fails to explicitly disclose wherein the one or more features comprise one or more biomarkers of interest, a targeted therapy or treatment, an electrocardiogram diagnosis, or a neuro-analytic treatment or diagnosis.
Emrani discloses wherein the one or more features comprise one or more biomarkers of interest, a targeted therapy or treatment, an electrocardiogram diagnosis, or a neuro-analytic treatment or diagnosis (Abstract; “In this paper, we employ a multi-task learning regression framework for prediction of Parkinson’s disease progression, where each task is the prediction of PD rating scales at one future time point. We then use the model to identify the important biomarkers predictive of disease progression”, which discloses the biomarkers of interest or diagnosis; and Page 1458, Column 1; “We employ a multi-task learning regression framework to predict PD progression for up to 4.5 years and to identify important predictive biomarkers from the learned model”).
Ding, Romera, and Emrani are analogous art because all are concerned with multitask learning.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in multitask learning to combine the neuro-analytic diagnosis of Emrani with the method of Ding and Romera to yield the predictable result 

Response to Arguments

Applicant’s arguments and amendments, filed on 3/5/2021, with respect to the objection to claims 5 and 7-16 have been fully considered and are persuasive.  In view of the 2019 PEG, claims 1-20 are not directed towards mental processes. The objection to claims 5 and 7-16 has been withdrawn.


Applicant’s arguments and amendments, filed on 3/5/2021, with respect to the 35 USC § 102(a)(1) rejection of claims 1-4, 6-12, 15, and 17-20 and 35 USC § 103 rejection of claims 5, 13, 14, and 16 have been considered but are moot because the arguments do not apply to any of the references being used in the current rejection to reject independent claims 1, 7, and 17.  Ding and Romera are now being used to render claims 1, 7, and 17 obvious under 35 USC § 103.



Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Brent Hoover whose telephone number is (303)297-4403.  The examiner can normally be reached on Monday - Friday 9-5 MST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
 
/BRENT JOHNSTON HOOVER/Examiner, Art Unit 2125                                                                                                                                                                                                        
                                                                                                                                                                                             /KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125