DETAILED ACTION
This action is in response to claims filed 08/09/2021 for application 16/430243 filed 06/03/2019. Claims 1, 8, and 15 are amended, claims 5, 12, and 19 are cancelled and claims 21 and 22 are new. Claims 1-4, 6-11, 13-18, and 20-22 are currently pending. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claim 22 is objected to because of the following informalities:  "The system of claim 21 wherein..." appears to be missing a comma and should read "The system of claim 21.  Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having 

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 2, 4, 8, 9, 11, 15, 16, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. ("GLMix: Generalized Linear Mixed Models For Large-Scale Response Prediction" cited by Applicant in the IDS filed on 06/04/2019, hereinafter "Zhang1") in view of Greene ("Fixed and Random Effects in Nonlinear Models", hereinafter "Greene").

Regarding claim 1, Zhang1 teaches A system comprising: a non-transitory computer-readable medium having instructions stored thereon, which, when executed by a processor (“Each node has 24 Intel Xeon(R) CPU E5-2640 processors with 6 cores at 2.50GHz each, and every node has 250GB memory.” [pg. 6, top left column]) cause the system to: 
obtain training data, the training data comprising values for a plurality of different features (“Now we consider the GLMix model for the job recommendation problem. To measure whether job j is a good match for a member m and to select the best jobs according to this measure, the key is to predict the probability that member m would apply for job j given an impression on the “Jobs you may be interested in” module. Let ymjt denote the binary response of whether member m would apply for job j in context t, where the context usually includes the time and location where the job is shown. We use qm to denote the feature vector of member m, which includes the features extracted from the member’s public profile, e.g., the member’s title, job function, education history, industry, etc. We use sj to denote the feature vector of job j, which includes features extracted from the job post, e.g. the job title, desired skills and experiences, etc. Let xmjt represent the overall feature vector for the (m, j, t) triple, which can include qm and sj for feature-level main effects, the outer product between qm and sj for interactions among member and job features, and features of the context. We assume that xmjt does not contain member IDs or item IDs as features, because IDs will be treated differently from regular features” [pg. 2, § 2.2 GLMix Model, ¶1]); 
train a global machine learned model using a first machine learning algorithm by feeding the training data into the first machine learning algorithm during a fixed effect training process “
    PNG
    media_image1.png
    454
    341
    media_image1.png
    Greyscale
”, pg. 3, § 3. Algorithm; A set of sample responses corresponds to training data; Zhang1 discloses a global machine learned model “b is the global coefficient vector (also called fixed effect coefficients in the statistical literature)” [pg. 2, § 2.2 GLMix Model, ¶1]); and
the subset of the training data being limited to training data corresponding to a particular value of one of the plurality of different features (“The features available in recommender systems often include user features (e.g., age, gender, industry, job function) and item features (e.g., title and skills for jobs, title and named entities for news articles). An approach that is widely adopted in industry to model interactions between users and items is to form the outer (cross) product of user and item features, followed by feature selection to reduce the dimensionality and mitigate the problem of overfitting. In reality, we often observe a lot of heterogeneity in the amount of data per user or item that cannot be sufficiently modeled by user/item features alone, which provides an opportunity to improve model accuracy by adding more granularity to the model. Specifically, for a user who has interacted with many items in the past, we should have sufficient data to fit regression coefficients that are specific to that user to capture his/her personal interests. Similarly, for an item that has received many users’ responses, it is beneficial to model its popularity and interactions with user features through regression coefficients that are specific to the item.” [pg.1, § 1. Introduction, ¶2]).
However Zhang1 fails to explicitly teach train a first non-linear random effects machine learned model by feeding a subset of the training data into a second machine learning algorithm, 
wherein the second machine learning algorithm is a Gaussian Process.
Greene teaches train a first non-linear random effects machine learned model (“
    PNG
    media_image2.png
    210
    582
    media_image2.png
    Greyscale
” [pg. 15, § 4. Random Effects and Random Parameters Models]) by feeding a subset of the training data into a second machine learning algorithm (“In this instance, the square of the first derivative is used as approximation to the second when the asymptotic covariance matrix is computed. (The algorithm used for estimation requires only first derivatives.)” [pg. 47, § Appendix A. Computation of the Random Parameters model; This corresponds to a second machine learning algorithm])
wherein the second machine learning algorithm is a Gaussian Process (“
    PNG
    media_image3.png
    246
    582
    media_image3.png
    Greyscale
” [pg. 16, § 4.1 Exact Integration and Closed Forms; Greene discloses a stochastic frontier model uses normally-distributed random variables, thus the examiner is interpreting this to be equivalent to a Gaussian process.])
Zhang1 and Greene both disclose fixed and random effects models for prediction and thus are analogous. Zhang1 discloses a generalized linear mixed model using a fixed and a random effect model. Greene discloses using fixed and random effects in nonlinear models. It would have been obvious to one of ordinary skill in the art before the effective filing date to substitute Zhang1’s random effects model with the nonlinear random effects model as taught by Greene. One would have been motivated to use a nonlinear random effects model for more flexibility and to overcome the problems of a linear model. [Abstract, Greene]

Regarding claim 2, the combination of Zhang1 and Greene teaches The system of claim 1, where Zhang1 further teaches wherein the system is further caused to: perform one or more iterations of a machine learned model training process, the one or more iterations continuing until a convergence test is met, each iteration comprising the obtaining training data, training the global machine learned model (“
    PNG
    media_image4.png
    198
    332
    media_image4.png
    Greyscale
” [pg. 4, Algorithm 1, top left col; Step 1 of the algorithm corresponds to performing iterations until the convergence test has been met. The algorithm is using a fixed effect parameter to train a global machined learned model.])
Zhang1 fails to explicitly teach and training the first non-linear random effects machine learned model.
Greene teaches training the first non-linear random effects machine learned model (See pg. 15, pg. 15, § 4. Random Effects and Random Parameters Models; Greene discloses nonlinear random effects model).
Zhang1 and Greene both disclose fixed and random effects models for prediction and thus are analogous. Zhang1 discloses a generalized linear mixed model using a fixed and a random effect model. Greene discloses using fixed and random effects in nonlinear models. It would have been obvious to one of ordinary skill in the art before the effective filing date to substitute Zhang1’s random effects model with the nonlinear random effects model as taught by Greene. One would have been motivated to use a nonlinear random effects model for more flexibility and to overcome the problems of a linear model. [Abstract, Greene]

Regarding claim 4, the combination of Zhang1 and Greene teaches The system of claim 1, where Zhang1 further teaches wherein the system is further caused to perform dimension reduction on the subset by applying a transformation to the subset (“Since the optimization problem in Equation (8) can be solved locally, we have an opportunity to apply some tricks that can further reduce the memory complexity C as defined in Equation (11). Note that although the overall feature space size is Pr for random effect r, sometimes the underlying dimension of the feature matrix Zrl could be smaller than Pr, due to the lack of support for certain features. For example, a member who is a software engineer is unlikely to be served jobs with the required skill ”medicine”. Hence there will not be any data for the feature“job skill=medicine” for this member’s random effects, and in such a scenario, Zrl would end up with an empty column. As a result, for each random effect r and ID l, we can condense Zrl by removing all the empty columns and reindexing the features to form a more compact feature matrix, which would also reduce the size of random effect coefficients γrl and potentially improve the overall efficiency of solving the local optimization problem in Equation (8). An example is shown in Figure 3, where we compare the random effect coefficient size before and after applying such a condensed data storage strategy on a data set consisting of four months’ worth of LinkedIn’s job recommendations.” [pg. 5, left col, ¶2; Examiner is interpreting reindexing the features to form a more compact feature matrix would be equivalent to performing a dimension reduction.]).

Regarding claim 8, Zhang1 teaches A method comprising: 
obtaining training data, the training data comprising values for a plurality of different features (“Now we consider the GLMix model for the job recommendation problem. To measure whether job j is a good match for a member m and to select the best jobs according to this measure, the key is to predict the probability that member m would apply for job j given an impression on the “Jobs you may be interested in” module. Let ymjt denote the binary response of whether member m would apply for job j in context t, where the context usually includes the time and location where the job is shown. We use qm to denote the feature vector of member m, which includes the features extracted from the member’s public profile, e.g., the member’s title, job function, education history, industry, etc. We use sj to denote the feature vector of job j, which includes features extracted from the job post, e.g. the job title, desired skills and experiences, etc. Let xmjt represent the overall feature vector for the (m, j, t) triple, which can include qm and sj for feature-level main effects, the outer product between qm and sj for interactions among member and job features, and features of the context. We assume that xmjt does not contain member IDs or item IDs as features, because IDs will be treated differently from regular features” [pg. 2, § 2.2 GLMix Model, ¶1]); 
training a global machine learned model using a first machine learning algorithm by feeding the training data into the first machine learning algorithm during a fixed effect training process “
    PNG
    media_image1.png
    454
    341
    media_image1.png
    Greyscale
”, pg. 3, § 3. Algorithm; A set of sample responses corresponds to training data; Zhang1 discloses a global machine learned model “b is the global coefficient vector (also called fixed effect coefficients in the statistical literature)” [pg. 2, § 2.2 GLMix Model, ¶1]); and
the subset of the training data being limited to training data corresponding to a particular value of one of the plurality of different features (“The features available in recommender systems often include user features (e.g., age, gender, industry, job function) and item features (e.g., title and skills for jobs, title and named entities for news articles). An approach that is widely adopted in industry to model interactions between users and items is to form the outer (cross) product of user and item features, followed by feature selection to reduce the dimensionality and mitigate the problem of overfitting. In reality, we often observe a lot of heterogeneity in the amount of data per user or item that cannot be sufficiently modeled by user/item features alone, which provides an opportunity to improve model accuracy by adding more granularity to the model. Specifically, for a user who has interacted with many items in the past, we should have sufficient data to fit regression coefficients that are specific to that user to capture his/her personal interests. Similarly, for an item that has received many users’ responses, it is beneficial to model its popularity and interactions with user features through regression coefficients that are specific to the item.” [pg.1, § 1. Introduction, ¶2]).
However Zhang1 fails to explicitly teach training a first non-linear random effects machine learned model by feeding a subset of the training data into a second machine learning algorithm,
wherein the second machine learning algorithm is a Gaussian Process
Greene teaches training a first non-linear random effects machine learned model (“
    PNG
    media_image2.png
    210
    582
    media_image2.png
    Greyscale
” [pg. 15, § 4. Random Effects and Random Parameters Models; Examiner is interpreting random effects machine learned model to be equivalent to a model with random variables.]) by feeding a subset of the training data into a second machine learning algorithm (“In this instance, the square of the first derivative is used as approximation to the second when the asymptotic covariance matrix is computed. (The algorithm used for estimation requires only first derivatives.)” [pg. 47, § Appendix A. Computation of the Random Parameters model; This corresponds to a second machine learning algorithm]),
wherein the second machine learning algorithm is a Gaussian Process (“
    PNG
    media_image3.png
    246
    582
    media_image3.png
    Greyscale
” [pg. 16, § 4.1 Exact Integration and Closed Forms; Greene discloses a stochastic frontier model uses normally-distributed random variables, thus the examiner is interpreting this to be equivalent to a Gaussian process.])
Zhang1 and Greene both disclose fixed and random effects models for prediction and thus are analogous. Zhang1 discloses a generalized linear mixed model using a fixed and a random effect model. Greene discloses using fixed and random effects in nonlinear models. It would have been obvious to one of ordinary skill in the art before the effective filing date to substitute Zhang1’s random effects model with the nonlinear random effects model as taught by Greene. One would have been motivated to use a nonlinear random effects model for more flexibility and to overcome the problems of a linear model. [Abstract, Greene]

Regarding claim 9, the combination of Zhang1 and Greene teaches The method of claim 8, where Zhang1 further teaches further comprising: performing one or more iterations of a machine learned model training process, the one or more iterations continuing until a convergence test is met, each iteration comprising the obtaining training data, training the global machine learned model (“
    PNG
    media_image4.png
    198
    332
    media_image4.png
    Greyscale
” [pg. 4, Algorithm 1, top left col; Step 1 of the algorithm corresponds to performing iterations until the convergence test has been met. The algorithm is using a fixed effect parameter to train a global machined learned model.])
Zhang1 fails to explicitly teach and training the first non-linear random effects machine learned model.
Greene teaches training the first non-linear random effects machine learned model (See pg. 15, § 4. Random Effects and Random Parameters Models; Greene discloses nonlinear random effects model).
Zhang1 and Greene both disclose fixed and random effects models for prediction and thus are analogous. Zhang1 discloses a generalized linear mixed model using a fixed and a random effect model. Greene discloses using fixed and random effects in nonlinear models. It would have been obvious to one of ordinary skill in the art before the effective filing date to substitute Zhang1’s random effects model with the nonlinear random effects model as taught by Greene. One would have been motivated to use a nonlinear random effects model for more flexibility and to overcome the problems of a linear model. [Abstract, Greene]

Regarding claim 11, the combination of Zhang1 and Greene teaches The method of claim 8, where Zhang1 further teaches further comprising performing dimension reduction on the subset by applying a transformation to the subset (“Since the optimization problem in Equation (8) can be solved locally, we have an opportunity to apply some tricks that can further reduce the memory complexity C as defined in Equation (11). Note that although the overall feature space size is Pr for random effect r, sometimes the underlying dimension of the feature matrix Zrl could be smaller than Pr, due to the lack of support for certain features. For example, a member who is a software engineer is unlikely to be served jobs with the required skill ”medicine”. Hence there will not be any data for the feature“job skill=medicine” for this member’s random effects, and in such a scenario, Zrl would end up with an empty column. As a result, for each random effect r and ID l, we can condense Zrl by removing all the empty columns and reindexing the features to form a more compact feature matrix, which would also reduce the size of random effect coefficients γrl and potentially improve the overall efficiency of solving the local optimization problem in Equation (8). An example is shown in Figure 3, where we compare the random effect coefficient size before and after applying such a condensed data storage strategy on a data set consisting of four months’ worth of LinkedIn’s job recommendations.” [pg. 5, left col, ¶2; Examiner is interpreting reindexing the features to form a more compact feature matrix would be equivalent to performing a dimension reduction.]).

Regarding claim 15, Zhang1 teaches A non-transitory machine-readable storage medium comprising instructions which, when implemented by one or more machines, cause the one or more machines to perform operations (“Each node has 24 Intel Xeon(R) CPU E5-2640 processors with 6 cores at 2.50GHz each, and every node has 250GB memory.” pg. 6, top left column) comprising: 
obtaining training data, the training data comprising values for a plurality of different features (“Now we consider the GLMix model for the job recommendation problem. To measure whether job j is a good match for a member m and to select the best jobs according to this measure, the key is to predict the probability that member m would apply for job j given an impression on the “Jobs you may be interested in” module. Let ymjt denote the binary response of whether member m would apply for job j in context t, where the context usually includes the time and location where the job is shown. We use qm to denote the feature vector of member m, which includes the features extracted from the member’s public profile, e.g., the member’s title, job function, education history, industry, etc. We use sj to denote the feature vector of job j, which includes features extracted from the job post, e.g. the job title, desired skills and experiences, etc. Let xmjt represent the overall feature vector for the (m, j, t) triple, which can include qm and sj for feature-level main effects, the outer product between qm and sj for interactions among member and job features, and features of the context. We assume that xmjt does not contain member IDs or item IDs as features, because IDs will be treated differently from regular features” [pg. 2, § 2.2 GLMix Model, ¶1]); 
training a global machine learned model using a first machine learning algorithm by feeding the training data into the first machine learning algorithm during a fixed effect training process “
    PNG
    media_image1.png
    454
    341
    media_image1.png
    Greyscale
”, pg. 3, § 3. Algorithm; A set of sample responses corresponds to training data; Zhang1 discloses a global machine learned model “b is the global coefficient vector (also called fixed effect coefficients in the statistical literature)” [pg. 2, § 2.2 GLMix Model, ¶1]); and
the subset of the training data being limited to training data corresponding to a particular value of one of the plurality of different features (“The features available in recommender systems often include user features (e.g., age, gender, industry, job function) and item features (e.g., title and skills for jobs, title and named entities for news articles). An approach that is widely adopted in industry to model interactions between users and items is to form the outer (cross) product of user and item features, followed by feature selection to reduce the dimensionality and mitigate the problem of overfitting. In reality, we often observe a lot of heterogeneity in the amount of data per user or item that cannot be sufficiently modeled by user/item features alone, which provides an opportunity to improve model accuracy by adding more granularity to the model. Specifically, for a user who has interacted with many items in the past, we should have sufficient data to fit regression coefficients that are specific to that user to capture his/her personal interests. Similarly, for an item that has received many users’ responses, it is beneficial to model its popularity and interactions with user features through regression coefficients that are specific to the item.” [pg.1, § 1. Introduction, ¶2]).
However Zhang1 fails to explicitly teach training a first non-linear random effects machine learned model by feeding a subset of the training data into a second machine learning algorithm,
wherein the second machine learning algorithm is a Gaussian Process.
Greene teaches training a first non-linear random effects machine learned model (“
    PNG
    media_image2.png
    210
    582
    media_image2.png
    Greyscale
” [pg. 15, § 4. Random Effects and Random Parameters Models]) by feeding a subset of the training data into a second machine learning algorithm (“In this instance, the square of the first derivative is used as approximation to the second when the asymptotic covariance matrix is computed. (The algorithm used for estimation requires only first derivatives.)” [pg. 47, § Appendix A. Computation of the Random Parameters model; This corresponds to a second machine learning algorithm])
wherein the second machine learning algorithm is a Gaussian Process (“
    PNG
    media_image3.png
    246
    582
    media_image3.png
    Greyscale
” [pg. 16, § 4.1 Exact Integration and Closed Forms; Greene discloses a stochastic frontier model uses normally-distributed random variables, thus the examiner is interpreting this to be equivalent to a Gaussian process.]).
Zhang1 and Greene both disclose fixed and random effects models for prediction and thus are analogous. Zhang1 discloses a generalized linear mixed model using a fixed and a random effect model. Greene discloses using fixed and random effects in nonlinear models. It would have been obvious to one of ordinary skill in the art before the effective filing date to substitute Zhang1’s random effects model with the nonlinear random effects model as taught by Greene. One would have been motivated to use a nonlinear random effects model for more flexibility and to overcome the problems of a linear model. [Abstract, Greene]

Regarding claim 16, the combination of Zhang1 and Greene teaches The non-transitory machine-readable storage medium of claim 15, where Zhang1 further teaches wherein the operations further comprise: performing one or more iterations of a machine learned model training process, the one or more iterations continuing until a convergence test is met, each iteration comprising the obtaining training data, training the global machine learned model, (“
    PNG
    media_image4.png
    198
    332
    media_image4.png
    Greyscale
” [pg. 4, Algorithm 1, top left col; Step 1 of the algorithm corresponds to performing iterations until the convergence test has been met. The algorithm is using a fixed effect parameter to train a global machined learned model.])
Zhang1 fails to explicitly teach and training the first non-linear random effects machine learned model.
Greene teaches training the first non-linear random effects machine learned model (pg. 15, § 4. Random Effects and Random Parameters Models; Greene discloses nonlinear random effects model).
Zhang1 and Greene both disclose fixed and random effects models for prediction and thus are analogous. Zhang1 discloses a generalized linear mixed model using a fixed and a random effect model. Greene discloses using fixed and random effects in nonlinear models. It would have been obvious to one of ordinary skill in the art before the effective filing date to substitute Zhang1’s random effects model with the nonlinear random effects model as taught by Greene. One would have been motivated to use a nonlinear random effects model for more flexibility and to overcome the problems of a linear model. [Abstract, Greene]

Regarding claim 18, the combination of Zhang1 and Greene teaches The non-transitory machine-readable storage medium of claim 15, where Zhang1 further teaches wherein the operations further comprise performing dimension reduction on the subset by applying a transformation to the subset (“Since the optimization problem in Equation (8) can be solved locally, we have an opportunity to apply some tricks that can further reduce the memory complexity C as defined in Equation (11). Note that although the overall feature space size is Pr for random effect r, sometimes the underlying dimension of the feature matrix Zrl could be smaller than Pr, due to the lack of support for certain features. For example, a member who is a software engineer is unlikely to be served jobs with the required skill ”medicine”. Hence there will not be any data for the feature“job skill=medicine” for this member’s random effects, and in such a scenario, Zrl would end up with an empty column. As a result, for each random effect r and ID l, we can condense Zrl by removing all the empty columns and reindexing the features to form a more compact feature matrix, which would also reduce the size of random effect coefficients γrl and potentially improve the overall efficiency of solving the local optimization problem in Equation (8). An example is shown in Figure 3, where we compare the random effect coefficient size before and after applying such a condensed data storage strategy on a data set consisting of four months’ worth of LinkedIn’s job recommendations.” [pg. 5, left col, ¶2; Examiner is interpreting reindexing the features to form a more compact feature matrix would be equivalent to performing a dimension reduction.]).

Claims 3, 6, 7, 10, 13, 14, 17, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang1 in view of Greene and further in view of Zhang et al. (US 20170323268 A1, hereinafter "Zhang2").

Regarding claim 3, the combination of Zhang1 and Greene teaches The system of claim 2, where Greene further teaches wherein each iteration further comprises: non-linear (See pg. 5, § 2. Nonlinear Models)
However the combination of Zhang1 and Greene fails to explicitly teach training a second random effects machine learned model by feeding a second subset of the training data into a third machine learning algorithm, the second subset of the training data being limited to training data corresponding to a particular value of another of the plurality of different features. 
Zhang2 teaches training a second random effects machine learned model by feeding a second subset of the training data into a third machine learning algorithm, the second subset of the training data being limited to training data corresponding to a particular value of another of the plurality of different features (“User-specific model 216 may be personalized to the individual behavior or preferences of the user with respect to certain job features, and each job-specific model may identify the relevance or attraction of the corresponding job to certain member features. Input to user-specific model 216 may include some or all job features 212 used by global model 214, and input to each job-specific model may include some or all member features 208 used by the global model. Alternatively, user-specific model 216 and job-specific models 218 may use different combinations of member, job, and/or derived features, including features that are not used by the global model.” [¶0026; note: Zhang2 discloses a global model, user-specific model, and a job-specific model. Examiner is interpreting a job-specific model to correspond to a second random effects machine learned model and it is implicit that an algorithm would be used for this model which corresponds to a third machine learning algorithm.]).
Zhang1, Greene and Zhang2 all disclose fixed and random effects models for prediction and thus are analogous. Zhang1 discloses a generalized linear mixed model using a fixed and a random effect model. Greene discloses using fixed and random effects in nonlinear models. Zhang2 discloses personalized recommendation models for predicting responses. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Zhang1’s fixed and random effects models and Greene’s nonlinear random effects model with the Job-specific model as taught by Zhang2. One would have been motivated to substitute the models of Zhang1 and Zhang2 with nonlinear random effect models taught by Greene for more flexibility and to overcome the problems of a linear model. [Abstract, Greene]

Regarding claim 6, the combination of Zhang1 and Greene teaches The system of claim 1, where Greene further teaches wherein the system is further caused to: nonlinear (See pg. 15, § 4. Random Effects and Random Parameters Models; Greene discloses nonlinear random effects model)
However the combination of Zhang1 and Greene fails to explicitly teach wherein the system is further caused to: 
feed candidate data into the global machine learned model, producing a first score; 
feed the candidate data into the first random effects machine learned model, producing a second score; and 
combine the first score and the second score into a ranking score, the ranking score used to rank the candidate data against other candidate data.
Zhang2 teaches wherein the system is further caused to: 
feed candidate data into the global machine learned model, producing a first score (“Analysis apparatus 204 may use each key to retrieve the job features for a subset of jobs to which the key is mapped in the inverted index and apply global model 214 to the job features for each job in the subset, member features 208, and/or derived features 210 to obtain a global score for the job.” [¶0033; global score corresponds to a first score.]); 
feed the candidate data into the first non-linear random effects machine learned model, producing a second score (“Next, analysis apparatus 204 may execute the second stage by using global model 214, user-specific model 216, and a subset of job-specific models 218 for jobs in subset 232 to generate a set of user-specific scores 230 for the jobs.” [¶0036, lines 1-4; user-specific scores corresponds to a second score.]); and
 combine the first score and the second score into a ranking score (“The output of global model 214, user-specific model 216, and job-specific models 218 may be combined to generate a score representing the user's predicted probability of applying to the jobs, clicking on the jobs, and/or otherwise responding positively to impressions of the jobs after the user is shown the jobs.” [¶0027, lines 1-6]), the ranking score used to rank the candidate data against other candidate data (“Operations 502-508 may be repeated for remaining jobs (operation 510). For example, user-specific scores may be generated for each job in a highest ranked subset of jobs from a previous ranking of the jobs, such as the ranking generated from a global version of the statistical model. A different combination of features and models for the user and job may be used to obtain output from the statistical models that is then combined to obtain a user-specific score for the job (operations 502-508). After user-specific scores have been obtained for all users and jobs, the jobs are ranked by user-specific score (operation 512). For example, the jobs may be ranked in descending order of user-specific score, so that jobs that are higher in the ranking have a higher predicted positive response (e.g., click, apply, etc.) for the user than jobs that are lower in the ranking. In turn, some or all of the jobs in the ranking may be outputted as job recommendations to the user, as discussed above.” [¶0055]).
Zhang1, Greene and Zhang2 all disclose fixed and random effects models for prediction and thus are analogous. Zhang1 discloses a generalized linear mixed model using a fixed and a random effect model. Greene discloses using fixed and random effects in nonlinear models. Zhang2 discloses personalized recommendation models for predicting responses. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Zhang1’s fixed and random effects models and Greene’s nonlinear random effects model by having the models produce a score and a ranking based off the scores as taught by Zhang2. One would have been motivated to make this modification as a method to determine which jobs are most relevant to a user. 

Regarding claim 7, the combination of Zhang1, Greene and Zhang2 teaches The system of claim 6, where Zhang1 further teaches wherein the candidate data is job posting results from an online service (“As the world’s largest professional social network, LinkedIn provides a unique value proposition for its over 400M members to connect with all kinds of professional opportunities for their career growth. One of the most important products is the Jobs Homepage, which serves as a central place for members with job seeking intention to come and find good jobs to apply for. Figure 1 is a snapshot of the LinkedIn Jobs Homepage. One of the main modules on the page is “Jobs you may be interested in”, where relevant job thumbnails are recommended to members based on their public profile data and past activity on the site. If a member is interested in a recommended job, she can click on it to go to the job detail page, where the original job post is shown with information such as the job title, description, responsibilities, required skills and qualifications. The job detail page also has an “apply” button which allows the member to apply for the job with one click, either on LinkedIn or on the website of the company posting the job. One of the key success metrics for LinkedIn jobs business is the total number of job application clicks (i.e the number of clicks on the “apply” button), which is the focus for the job recommendation problem in this paper.” [pg. 2, § 2.1 Job Recommendation at LinkedIn]).

Regarding claim 10, the combination of Zhang1 and Greene teaches The method of claim 9, where Greene further teaches wherein each iteration further comprises: non-linear (See pg. 5, § 2. Nonlinear Models)
 However the combination of Zhang1 and Greene fails to explicitly teach training a second random effects machine learned model by feeding a second subset of the training data into a third machine learning algorithm, the second subset of the training data being limited to training data corresponding to a particular value of another of the plurality of different features. 
Zhang2 teaches training a second random effects machine learned model by feeding a second subset of the training data into a third machine learning algorithm, the second subset of the training data being limited to training data corresponding to a particular value of another of the plurality of different features (“User-specific model 216 may be personalized to the individual behavior or preferences of the user with respect to certain job features, and each job-specific model may identify the relevance or attraction of the corresponding job to certain member features. Input to user-specific model 216 may include some or all job features 212 used by global model 214, and input to each job-specific model may include some or all member features 208 used by the global model. Alternatively, user-specific model 216 and job-specific models 218 may use different combinations of member, job, and/or derived features, including features that are not used by the global model.” [¶0026; note: Zhang2 discloses a global model, user-specific model, and a job-specific model. Examiner is interpreting a job-specific model to correspond to a second random effects machine learned model and it is implicit that an algorithm would be used for this model which corresponds to a third machine learning algorithm.]).
Zhang1, Greene and Zhang2 all disclose fixed and random effects models for prediction and thus are analogous. Zhang1 discloses a generalized linear mixed model using a fixed and a random effect model. Greene discloses using fixed and random effects in nonlinear models. Zhang2 discloses personalized recommendation models for predicting responses. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Zhang1’s fixed and random effects models and Greene’s nonlinear random effects model with the Job-specific model as taught by Zhang2. One would have been motivated to substitute the models of Zhang1 and Zhang2 with nonlinear random effect models taught by Greene for more flexibility and to overcome the problems of a linear model. [Abstract, Greene]

Regarding claim 13, the combination of Zhang1 and Greene teaches The method of claim 8, where Greene further teaches nonlinear (See pg. 15, § 4. Random Effects and Random Parameters Models; Greene discloses nonlinear random effects model)
However the combination of Zhang1 and Greene fails to explicitly teach further comprising: 
feeding candidate data into the global machine learned model, producing a first score; 
feeding the candidate data into the first random effects machine learned model, producing a second score; and 
combining the first score and the second score into a ranking score, the ranking score used to rank the candidate data against other candidate data.
Zhang2 teaches further comprising: 
feeding candidate data into the global machine learned model, producing a first score (“Analysis apparatus 204 may use each key to retrieve the job features for a subset of jobs to which the key is mapped in the inverted index and apply global model 214 to the job features for each job in the subset, member features 208, and/or derived features 210 to obtain a global score for the job.” [¶0033; global score corresponds to a first score.]); 
feeding the candidate data into the first non-linear random effects machine learned model, producing a second score (“Next, analysis apparatus 204 may execute the second stage by using global model 214, user-specific model 216, and a subset of job-specific models 218 for jobs in subset 232 to generate a set of user-specific scores 230 for the jobs.” [¶0036, lines 1-4; user-specific scores corresponds to a second score.]); and
 combining the first score and the second score into a ranking score (“The output of global model 214, user-specific model 216, and job-specific models 218 may be combined to generate a score representing the user's predicted probability of applying to the jobs, clicking on the jobs, and/or otherwise responding positively to impressions of the jobs after the user is shown the jobs.” [¶0027, lines 1-6]), the ranking score used to rank the candidate data against other candidate data (“Operations 502-508 may be repeated for remaining jobs (operation 510). For example, user-specific scores may be generated for each job in a highest ranked subset of jobs from a previous ranking of the jobs, such as the ranking generated from a global version of the statistical model. A different combination of features and models for the user and job may be used to obtain output from the statistical models that is then combined to obtain a user-specific score for the job (operations 502-508). After user-specific scores have been obtained for all users and jobs, the jobs are ranked by user-specific score (operation 512). For example, the jobs may be ranked in descending order of user-specific score, so that jobs that are higher in the ranking have a higher predicted positive response (e.g., click, apply, etc.) for the user than jobs that are lower in the ranking. In turn, some or all of the jobs in the ranking may be outputted as job recommendations to the user, as discussed above.” [¶0055]).
Zhang1, Greene and Zhang2 all disclose fixed and random effects models for prediction and thus are analogous. Zhang1 discloses a generalized linear mixed model using a fixed and a random effect model. Greene discloses using fixed and random effects in nonlinear models. Zhang2 discloses personalized recommendation models for predicting responses. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Zhang1’s fixed and random effects models and Greene’s nonlinear random effects model by having the models produce a score and a ranking based off the scores as taught by Zhang2. One would have been motivated to make this modification as a method to determine which jobs are most relevant to a user. 

	Regarding claim 14, the combination of Zhang1, Greene, and Zhang2 teaches The method of claim 13, where Zhang1 further teaches wherein the candidate data is job posting results from an online service (“As the world’s largest professional social network, LinkedIn provides a unique value proposition for its over 400M members to connect with all kinds of professional opportunities for their career growth. One of the most important products is the Jobs Homepage, which serves as a central place for members with job seeking intention to come and find good jobs to apply for. Figure 1 is a snapshot of the LinkedIn Jobs Homepage. One of the main modules on the page is “Jobs you may be interested in”, where relevant job thumbnails are recommended to members based on their public profile data and past activity on the site. If a member is interested in a recommended job, she can click on it to go to the job detail page, where the original job post is shown with information such as the job title, description, responsibilities, required skills and qualifications. The job detail page also has an “apply” button which allows the member to apply for the job with one click, either on LinkedIn or on the website of the company posting the job. One of the key success metrics for LinkedIn jobs business is the total number of job application clicks (i.e the number of clicks on the “apply” button), which is the focus for the job recommendation problem in this paper.” [pg. 2, § 2.1 Job Recommendation at LinkedIn]).

Regarding claim 17, the combination of Zhang1 and Greene teaches The non-transitory machine-readable storage medium of claim 16, where Greene further teaches wherein each iteration further comprises: non-linear (See pg. 5, § 2. Nonlinear Models)
 However the combination of Zhang1 and Greene fails to explicitly teach training a second random effects machine learned model by feeding a second subset of the training data into a third machine learning algorithm, the second subset of the training data being limited to training data corresponding to a particular value of another of the plurality of different features. 
Zhang2 teaches training a second random effects machine learned model by feeding a second subset of the training data into a third machine learning algorithm, the second subset of the training data being limited to training data corresponding to a particular value of another of the plurality of different features (“User-specific model 216 may be personalized to the individual behavior or preferences of the user with respect to certain job features, and each job-specific model may identify the relevance or attraction of the corresponding job to certain member features. Input to user-specific model 216 may include some or all job features 212 used by global model 214, and input to each job-specific model may include some or all member features 208 used by the global model. Alternatively, user-specific model 216 and job-specific models 218 may use different combinations of member, job, and/or derived features, including features that are not used by the global model.” [¶0026; note: Zhang2 discloses a global model, user-specific model, and a job-specific model. Examiner is interpreting a job-specific model to correspond to a second random effects machine learned model and it is implicit that an algorithm would be used for this model which corresponds to a third machine learning algorithm.]).
Zhang1, Greene and Zhang2 all disclose fixed and random effects models for prediction and thus are analogous. Zhang1 discloses a generalized linear mixed model using a fixed and a random effect model. Greene discloses using fixed and random effects in nonlinear models. Zhang2 discloses personalized recommendation models for predicting responses. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Zhang1’s fixed and random effects models and Greene’s nonlinear random effects model with the Job-specific model as taught by Zhang2. One would have been motivated to substitute the models of Zhang1 and Zhang2 with nonlinear random effect models taught by Greene for more flexibility and to overcome the problems of a linear model. [Abstract, Greene]

Regarding claim 20, the combination of Zhang1 and Greene teaches The non-transitory machine-readable storage medium of claim 15, where Greene further teaches nonlinear (See pg. 15, § 4. Random Effects and Random Parameters Models; Greene discloses nonlinear random effects model)
However the combination of Zhang1 and Greene fails to explicitly teach wherein the operations further comprise: 
feeding candidate data into the global machine learned model, producing a first score; 
feeding the candidate data into the first random effects machine learned model, producing a second score; and 
combining the first score and the second score into a ranking score, the ranking score used to rank the candidate data against other candidate data.
Zhang2 teaches wherein the operations further comprise:
feeding candidate data into the global machine learned model, producing a first score (“Analysis apparatus 204 may use each key to retrieve the job features for a subset of jobs to which the key is mapped in the inverted index and apply global model 214 to the job features for each job in the subset, member features 208, and/or derived features 210 to obtain a global score for the job.” [¶0033; global score corresponds to a first score.]); 
feeding the candidate data into the first non-linear random effects machine learned model, producing a second score (“Next, analysis apparatus 204 may execute the second stage by using global model 214, user-specific model 216, and a subset of job-specific models 218 for jobs in subset 232 to generate a set of user-specific scores 230 for the jobs.” [¶0036, lines 1-4; user-specific scores corresponds to a second score.]); and
 combining the first score and the second score into a ranking score (“The output of global model 214, user-specific model 216, and job-specific models 218 may be combined to generate a score representing the user's predicted probability of applying to the jobs, clicking on the jobs, and/or otherwise responding positively to impressions of the jobs after the user is shown the jobs.” [¶0027, lines 1-6]), the ranking score used to rank the candidate data against other candidate data (“Operations 502-508 may be repeated for remaining jobs (operation 510). For example, user-specific scores may be generated for each job in a highest ranked subset of jobs from a previous ranking of the jobs, such as the ranking generated from a global version of the statistical model. A different combination of features and models for the user and job may be used to obtain output from the statistical models that is then combined to obtain a user-specific score for the job (operations 502-508). After user-specific scores have been obtained for all users and jobs, the jobs are ranked by user-specific score (operation 512). For example, the jobs may be ranked in descending order of user-specific score, so that jobs that are higher in the ranking have a higher predicted positive response (e.g., click, apply, etc.) for the user than jobs that are lower in the ranking. In turn, some or all of the jobs in the ranking may be outputted as job recommendations to the user, as discussed above.” [¶0055]).
Zhang1, Greene and Zhang2 all disclose fixed and random effects models for prediction and thus are analogous. Zhang1 discloses a generalized linear mixed model using a fixed and a random effect model. Greene discloses using fixed and random effects in nonlinear models. Zhang2 discloses personalized recommendation models for predicting responses. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Zhang1’s fixed and random effects models and Greene’s nonlinear random effects model by having the models produce a score and a ranking based off the scores as taught by Zhang2. One would have been motivated to make this modification as a method to determine which jobs are most relevant to a user.

Claims 21 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang1 in view of Greene and further in view of Yu et al. ("Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model", hereinafter "Yu").

	Regarding claim 21, Zhang1 and Greene teaches The system of claim 1, however Zhang1/Greene fails to explicitly teach wherein the Gaussian process is fitted for each combination of values of a plurality of random effects.
	Yu teaches wherein the Gaussian process is fitted for each combination of values of a plurality of random effects (“
    PNG
    media_image5.png
    322
    321
    media_image5.png
    Greyscale

    PNG
    media_image6.png
    96
    320
    media_image6.png
    Greyscale
” [pg. 1186, § 2. A Random Effects Model, ¶3 – § 2.1. Our Model, ¶1; Note: Yu discloses “Gaussian Process” [pg. 1185, 1. Introduction, ¶1-2]; See further “In order to directly model the dependency between tasks, a multi-task Gaussian process approach…”]).
	Zhang1, Greene and Yu all disclose random effects models for prediction and thus are analogous. Zhang1 discloses a generalized linear mixed model using a fixed and a random effect model. Greene discloses using fixed and random effects in nonlinear models. Yu discloses a nonparametric model that uses a multi-task Gaussian Process approach. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Zhang1’s and Greene’s teachings to implement a Gaussian Process that is applied to a combination of values from a plurality of random effects. One would have been motivated to make this modification in order to support efficient learning on large scale data. [pg. 1186, left col, ¶2, Yu]

	Regarding claim 22, the combination of Zhang1, Greene, and Yu teaches The system of claim 21, where Zhang1 further teaches wherein the plurality of random effects include user and item (“For the specific case of job recommendations, there are two types of random effects: R = {member, job}, and n is a per-sample index that represents the triple (m, j, t). For the member-level random effect (i.e., r=member), zrn represents sj and γr,i(r,n) represents αm. For the job-level random effect (i.e., r=job), zrn represents qm and γr,i(r,n) represents βj . We generalize the Gaussian priors of fixed effects b and random effects to p(·), and also use Nr to denote the total number of instances for random effect type r, e.g., when r represents member, Nr represents the total number of members in the data set.” [pg. 3, § General Formulation, ¶2; Examiner is interpreting user and member to be synonymous.]).

Response to Arguments
Applicant's arguments filed 08/09/2021 have been fully considered but they are not persuasive. 

Applicant’s arguments on pgs. 7-9 with respect to claim 5 which has been incorporated into independent claims 1, 8, and 15 have been considered but are not persuasive. Applicant appears to be arguing the prior art of Greene fails to explicitly teach that second machine learning algorithm is a Gaussian process. It appears that the applicant is arguing a narrow definition of the term “Gaussian Process” which is not defined explicitly in the specification but is defined by the Applicant on pg. 9. The examiner agrees that the prior art of Greene does not explicitly teach the supplied definition of a “Gaussian Process”. However, as noted in the previous office action and in the rejection above, the examiner is interpreting the term “Gaussian Process” as equivalent to a process that uses Gaussian variables. Thus, the examiner’s interpretation would fall under the broadest reasonable interpretation of a “Gaussian Process”. Therefore, applicant’s arguments are not persuasive. 

Applicant’s arguments with respect to the rejections of the dependent claims have been fully considered but they are not persuasive as they rely upon the allowability of the independent claims. 

Conclusion
Applicant's amendment necessitated the new grounds of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491. The examiner can normally be reached Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/M.H.H./Examiner, Art Unit 2122                                                                                                                                                                                                        
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122