DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to amendments and remarks filed on 03/07/2022. In the current amendments, claims 1-20 are amended. Claims 1-20 are pending and have been examined.
In response to amendments and remarks filed on 03/07/2022 with respect to the claim of priority to Provisional Application 62/395,857 (filing date 09/16/2016), the effective filing date of the present application is 09/16/2016.
In response to amendments and/or remarks filed on 03/07/2022, the Specification and claim objections, the 35 U.S.C. 112(a) rejection to claim 5, the 35 U.S.C. 112(b) rejection to claims 2, 5, and 15, the 35 U.S.C. 101 rejection to claims 1-20, and the 35 U.S.C. 102(a)(1) rejection to claims 1, 6, and 17-20 made in the previous Office Action have been withdrawn.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 3-4, 8, and 13-14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 3 recites the limitation "the second set of model parameters" in line 2.  There is insufficient antecedent basis for this limitation in the claim. For examination purposes, "the second set of model parameters" has been interpreted as "a set of model parameters".
Claim 4 recites the limitation "the second feature vector" in line 8.  There is insufficient antecedent basis for this limitation in the claim. For examination purposes, "the second feature vector" has been interpreted as "a second feature vector".
Claim 8 recites the limitation "the first set of model parameters" in line 2.  There is insufficient antecedent basis for this limitation in the claim. For examination purposes, "the first set of model parameters" has been interpreted as "a first set of model parameters".
Claim 8 recites the limitation "the second set of model parameters" in line 2.  There is insufficient antecedent basis for this limitation in the claim. For examination purposes, "the second set of model parameters" has been interpreted as "a second set of model parameters".
Claim 13 recites the limitation "the first set of model parameters" in line 5-6.  There is insufficient antecedent basis for this limitation in the claim. For examination purposes, "the first set of model parameters" has been interpreted as "a first set of model parameters".
Claim 13 recites the limitation "the second set of model parameters" in line 6-7.  There is insufficient antecedent basis for this limitation in the claim. For examination purposes, "the second set of model parameters" has been interpreted as "a second set of model parameters". 
Dependent claim 14 is rejected based on the same rationale as claim 13.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4, 6-7, 12-13, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Yan et al. (“E-Commerce Item Recommendation Based on Field-aware Factorization Machine”) in view of Vijay et al. (US 2015/0381552 A1).
Regarding Claim 1,
Yan et al. teaches ...receiving training data that identifies a set of actions taken by a plurality of users with respect to a plurality of items (pg. 1 second full paragraph: “The training data provided by RecSys 2015 challenge [1] comprises a sequence of clicks and buy events performed by some user during a typical session on an e-commerce website. Test data contains click events only. Each click event contains session id (user), click time, item id and item’s category while each buy event contains session id, buy time, item id, item price and quantity of the item bought. Note that there is no other explicit identification for each user so we use session ids as user ids, i.e., we treat each session as belonging to a distinct user. There are a total of 33M clicks, 1.1M buys, 9M sessions and 53K items in the training data. The time span of these events is 6 months” teaches receiving training data that identifies a sequence of clicks and buy events (correspond to a set of actions) performed by users with respect to a plurality of items; since each session is treated as belonging to a distinct user, the training data containing 9M sessions renders that the data represents user actions taken by a plurality of users with respect to a plurality of items);
responsive to receiving the training data, training a prediction model, wherein training the prediction model includes generating, for the prediction model, (a) a linear prediction component modelling linear relationships, learned from the training data, between features of the plurality of users and the set of actions (pg. 2 Sections 3-3.1: “Our features are comprised of original features extracted directly from raw data and secondary features learned from the predictions of two classification models trained on the original features...Our task is to predict the probability of a user buying an item in the specified time. So we extract original features from three aspects including user, item and time” and pg. 2 section 3.2: “In particular, we train a Gradient Boosting Decision Tree (GBDT) and a Field-aware Factorization Machine (FFM) (to be explained later in Section 4) on the original features” teach extracting original features from the training data that has been received wherein the original features are used to train a Field-aware Factorization Machine (FFM) (corresponds to training a prediction model); pg. 4 first full paragraph: “
    PNG
    media_image1.png
    270
    562
    media_image1.png
    Greyscale
” teaches the FFM model contains a linear component as modeled by 
    PNG
    media_image2.png
    65
    80
    media_image2.png
    Greyscale
, which models the linear relationship learned from the training data between features represented by feature vector                        
                             
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     (representing features of the users) and the target                         
                            y
                        
                     (representing the probability of user buying an item in the specified time, which correspond to a set of actions); claim interpretation of this limitation is in view of Specification [0052]-[0054]), and
(b) a nonlinear prediction component modelling nonlinear interactions, learned from the training data, between different features in predicting the set of actions (pg. 3 Section 4: “With FFM, features are organized into fields. A field can be viewed as corresponding to a class of features. FFM learns a different set of latent factors for every pair of fields, i.e., each feature uses a different k-vector to interact with other features from different field” and pg. 4 first full paragraph: “ 
    PNG
    media_image1.png
    270
    562
    media_image1.png
    Greyscale
” teach the FFM model contains a nonlinear component as modeled by 
    PNG
    media_image3.png
    72
    306
    media_image3.png
    Greyscale
, which models nonlinear interactions between different features (for example,                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     and                         
                            
                                
                                    x
                                
                                
                                    j
                                
                            
                        
                    ) of the training data in predicting target                         
                            y
                        
                     (correspond to the set of actions); claim interpretation of this limitation is in view of Specification [0052]-[0054]);
generating a first feature vector based on a first set of features associated with a user that is currently accessing a particular web service (pg. 1 second full paragraph: “The training data provided by RecSys 2015 challenge [1] comprises a sequence of clicks and buy events performed by some user during a typical session on an e-commerce website. Test data contains click events only. Each click event contains session id (user), click time, item id and item’s category while each buy event contains session id, buy time, item id, item price and quantity of the item bought. Note that there is no other explicit identification for each user so we use session ids as user ids, i.e., we treat each session as belonging to a distinct user. There are a total of 33M clicks, 1.1M buys, 9M sessions and 53K items in the training data. The time span of these events is 6 months” and pg. 3 Section 4: “
    PNG
    media_image4.png
    76
    564
    media_image4.png
    Greyscale
“ teach generating a feature vector                         
                            x
                        
                     based on first set of features associated with a user that is currently accessing an e-commerce website; Table 1 teaches a set of features associated with users);
for each respective item in a plurality of items: 
generating a respective feature vector for the respective item based on a respective set of features associated with the respective item (pg. 3 Section 4: “
    PNG
    media_image4.png
    76
    564
    media_image4.png
    Greyscale
“ teaches generating a feature vector                         
                            x
                        
                     based on a set of features associated with an item; Table 1 teaches a set of features associated with items);
determining a respective probabilistic score of the user performing a target action (pg. 2 Sections 3-3.1: “Our features are comprised of original features extracted directly from raw data and secondary features learned from the predictions of two classification models trained on the original features...Our task is to predict the probability of a user buying an item in the specified time. So we extract original features from three aspects including user, item and time” teaches predicting a probability (corresponds to determining probabilistic score) of the user buying an item (corresponds to user performing a target action));
wherein the respective probabilistic score for a respective item is computed as a function of (a) the linear prediction component, (b) the nonlinear prediction component, (c) the first feature vector, and (d) the respective feature vector for the respective item (pg. 2 Sections 3-3.1: “Our features are comprised of original features extracted directly from raw data and secondary features learned from the predictions of two classification models trained on the original features...Our task is to predict the probability of a user buying an item in the specified time. So we extract original features from three aspects including user, item and time” and pg. 2 section 3.2: “In particular, we train a Gradient Boosting Decision Tree (GBDT) and a Field-aware Factorization Machine (FFM) (to be explained later in Section 4) on the original features” teach predicting a probability (corresponds to determining probabilistic score) with the FFM model; pg. 4 first full paragraph: “
    PNG
    media_image1.png
    270
    562
    media_image1.png
    Greyscale
” teaches the FFM model computes target                         
                            y
                        
                     (in the form of probability) as a function of linear component as modeled by (a) 
    PNG
    media_image2.png
    65
    80
    media_image2.png
    Greyscale
 , (b) nonlinear component as modeled by 
    PNG
    media_image3.png
    72
    306
    media_image3.png
    Greyscale
, (c) the first feature vector (can be vector                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     or                         
                            
                                
                                    x
                                
                                
                                    j
                                
                            
                        
                    ); and (d) the respective feature vector (can be vector                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     or                         
                            
                                
                                    x
                                
                                
                                    j
                                
                            
                        
                    ); Table 1 teaches both user features and item features are used in FFM model, therefore either vector                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     or                         
                            
                                
                                    x
                                
                                
                                    j
                                
                            
                        
                     can represent the first feature vector or the respective feature vector).
Yan et al. does not appear to explicitly teach One or more non-transitory computer-readable storage media storing instructions which, when executed by one or more hardware processors, causes performance of a set of operations comprising:...identifying, based on the probabilistic score for each respective item, a subset of items from the plurality of items to present to the user; and presenting the subset of items from the plurality of items to the user through an interface of the particular web service.
However, Vijay et al. teaches One or more non-transitory computer-readable storage media storing instructions which, when executed by one or more hardware processors, causes performance of a set of operations comprising (see pg. 15 [0104] & [0107]):...
identifying, based on the probabilistic score for each respective item, a subset of items from the plurality of items to present to the user; and presenting the subset of items from the plurality of items to the user through an interface of the particular web service (pg. 9 [0074]: “For example, the e-mail content response prediction system may predict that, if the member is more likely to be engaged with this type of product, the system 200 may determine that promotion content related to that type of product should be included in the e-mail. As another example, if the email content response prediction system determines that users viewing emails already including content types A, B, and C are more likely to be interested in content related to D, but not content related to E, the system 200 may choose promotion content related to D as appropriate. More specifically, for a given slot/position in given e-mail type, the e-mail response prediction system is configured to rank n number of external sources of content (that were not originally part of that e-mail) and to choose one or more of the external sources of content to be displayed as the promotion content 1603, as illustrated in FIG. 16” teaches that the email response prediction system 200 (which contains the prediction module 204) predicts if the member is more likely to be engaged with a specific type of product, and accordingly identifies a number of content items to be presented to the user through an interface for an email (see Fig. 16; pg. 4 [0045] teaches email through web service); pg. 5 [0051]: “the prediction module 204 performs a prediction modeling process based on the assembled feature vector 510 and a prediction model to predict a likelihood (e.g., the probability) of the particular member performing a particular user action (e.g., click) on the particular email content item ( e.g., a particular type of e-mail)” and pg. 3 [0037]: “Thereafter, the prediction module 204 performs a prediction modeling process based on the assembled feature vector and a prediction model to predict a likelihood of a particular member performing a particular user action on a particular email content item (e.g., the email content item described by the raw email content data). The prediction module 204 may use any one of various known prediction modeling techniques to perform the prediction modeling. For example, the prediction module 204 may apply a statistics-based machine learning model such as a logistic regression model to the assembled feature vector” teach determining a probability (probabilistic score) of a user performing a particular target action for a particular item, or in other words, predicts if the member is likely to be engaged with a specific type of product). 
Yan et al. and Vijay et al. are analogous art to the claimed invention because they are directed to prediction models.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Vijay et al. to the disclosed invention of Yan et al.
One of ordinary skill in the arts would have been motivated to make this modification to “determine or predict what emails (or what content within emails) a member of an online social network service is likely to interact with. This information may be used by the email response prediction system to, for example, downshift/filter out certain emails to members if the email response prediction system determines that there's a low likelihood of a click on that email, which may reduce costs and member annoyance, while dramatically improving click through rate (CTR)” (Vijay et al. pg. 1 [0026]).

Regarding Claim 4,
Yan et al. in view of Vijay et al. teaches the non-transitory computer-readable storage media of Claim 1.
Yan et al. further teaches wherein determining the respective probabilistic score comprises generating a first result vector by applying the linear prediction component and the nonlinear prediction component to the first feature vector;...generating at least a second result vector by applying the linear prediction component and the nonlinear prediction component to the second feature vector for each respective item; and combining the cached first result vector with at least the second result vector to compute the respective probabilistic score (pg. 2 Sections 3-3.1: “Our features are comprised of original features extracted directly from raw data and secondary features learned from the predictions of two classification models trained on the original features...Our task is to predict the probability of a user buying an item in the specified time. So we extract original features from three aspects including user, item and time” and pg. 2 section 3.2: “In particular, we train a Gradient Boosting Decision Tree (GBDT) and a Field-aware Factorization Machine (FFM) (to be explained later in Section 4) on the original features” teach predicting a probability (corresponds to determining probabilistic score) with the FFM model; pg. 4 first full paragraph: “
    PNG
    media_image1.png
    270
    562
    media_image1.png
    Greyscale
” teaches the FFM model applies a linear component as modeled by 
    PNG
    media_image2.png
    65
    80
    media_image2.png
    Greyscale
 and a nonlinear component as modeled by 
    PNG
    media_image3.png
    72
    306
    media_image3.png
    Greyscale
 to a first feature vector                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     when                         
                            i
                            =
                            1
                        
                     and to a second feature vector when                         
                            i
                            =
                            2
                        
                    , therefore respectively generating a first result vector and a second result vector; the summation 
    PNG
    media_image5.png
    74
    31
    media_image5.png
    Greyscale
 indicates combining the cached first result vector with the second result vector to compute                         
                            y
                            (
                            x
                            )
                        
                    , which corresponds to the output probability; the first result vector is considered cached because it needs to be put away for use in the second iteration to perform the summation operation).
Vijay et al. further teaches caching the first result vector (Fig. 5 teaches an assembled feature vector (corresponds to a first result vector) used for the prediction module; pg. 16 [0118]: “the term "machine-readable medium" may include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more instructions or data structures” teaches storing (caching) data structures (including vectors)).
Yan et al. and Vijay et al. are analogous art to the claimed invention because they are directed to prediction models.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Vijay et al. to the disclosed invention of Yan et al.
One of ordinary skill in the arts would have been motivated to make this modification to “determine or predict what emails (or what content within emails) a member of an online social network service is likely to interact with. This information may be used by the email response prediction system to, for example, downshift/filter out certain emails to members if the email response prediction system determines that there's a low likelihood of a click on that email, which may reduce costs and member annoyance, while dramatically improving click through rate (CTR)” (Vijay et al. pg. 1 [0026]).
Regarding Claim 6,
Yan et al. in view of Vijay et al. teaches the non-transitory computer-readable storage media of Claim 1.
Yan et al. further teaches wherein the first set of features associated with the user include at least one feature extracted from clickstream data captured from the user (Table 1 and pg. 1 second full paragraph: “The training data provided by RecSys 2015 challenge [1] comprises a sequence of clicks and buy events performed by some user during a typical session on an e-commerce website. Test data contains click events only. Each click event contains session id (user), click time, item id and item’s category while each buy event contains session id, buy time, item id, item price and quantity of the item bought. Note that there is no other explicit identification for each user so we use session ids as user ids, i.e., we treat each session as belonging to a distinct user. There are a total of 33M clicks, 1.1M buys, 9M sessions and 53K items in the training data. The time span of these events is 6 months” teach a first set of features associated with user include features extracted from sequences of clicks and buy events performed by users).
Regarding Claim 7,
Yan et al. in view of Vijay et al. teaches the non-transitory computer-readable storage media of Claim 1.
Yan et al. further teaches wherein training the prediction model comprises training a factorization machine, wherein the linear prediction component is a first component of the factorization machine subsequent to training and the nonlinear prediction component is a second prediction component of the factorization machine (pg. 2 Sections 3-3.1: “Our features are comprised of original features extracted directly from raw data and secondary features learned from the predictions of two classification models trained on the original features...Our task is to predict the probability of a user buying an item in the specified time. So we extract original features from three aspects including user, item and time” and pg. 2 section 3.2: “In particular, we train a Gradient Boosting Decision Tree (GBDT) and a Field-aware Factorization Machine (FFM) (to be explained later in Section 4) on the original features” teach extracting original features from the training data that has been received wherein the original features are used to train a Field-aware Factorization Machine (FFM) (corresponds to training a factorization machine); pg. 4 first full paragraph: “
    PNG
    media_image1.png
    270
    562
    media_image1.png
    Greyscale
” teaches the FFM model contains a linear component as modeled by 
    PNG
    media_image2.png
    65
    80
    media_image2.png
    Greyscale
 as a first component of the factorization machine subsequent to training and a nonlinear component as modeled by 
    PNG
    media_image3.png
    72
    306
    media_image3.png
    Greyscale
as a second prediction component of the factorization machine).
Regarding Claim 12,
Yan et al. in view of Vijay et al. teaches the non-transitory computer-readable storage media of Claim 1.
Yan et al. further teaches wherein the prediction model is a factorization machine comprising the linear prediction component and the nonlinear prediction component (pg. 2 Sections 3-3.1: “Our features are comprised of original features extracted directly from raw data and secondary features learned from the predictions of two classification models trained on the original features...Our task is to predict the probability of a user buying an item in the specified time. So we extract original features from three aspects including user, item and time” and pg. 2 section 3.2: “In particular, we train a Gradient Boosting Decision Tree (GBDT) and a Field-aware Factorization Machine (FFM) (to be explained later in Section 4) on the original features” teach extracting original features from the training data that has been received wherein the original features are used to train a Field-aware Factorization Machine (FFM) (corresponds to training a factorization machine); pg. 4 first full paragraph: “
    PNG
    media_image1.png
    270
    562
    media_image1.png
    Greyscale
” teaches the FFM model contains a linear component as modeled by 
    PNG
    media_image2.png
    65
    80
    media_image2.png
    Greyscale
 and a nonlinear component as modeled by 
    PNG
    media_image3.png
    72
    306
    media_image3.png
    Greyscale
).
Regarding Claim 13,
Yan et al. in view of Vijay et al. teaches the non-transitory computer-readable storage media of Claim 12.
Yan et al. further teaches 
    PNG
    media_image6.png
    439
    1072
    media_image6.png
    Greyscale

(pg. 3 Section 4: 
    PNG
    media_image7.png
    368
    579
    media_image7.png
    Greyscale
teaches the claimed equation to predict y, which is the predicted action (pg. 2 Section 2: “The task of predicting whether and what a user will buy can be cast as a binary classification problem with a classifier that outputs a purchase probability. Each session-item pair provides a training instance and the classification models are trained to predict the probability for buying each item a user clicks”); pg. 3 Section 4 further teaches the dot product between two factorization vectors applied to a pairwise interaction between the ith feature and the jth feature in feature vector x and w0 corresponds to the global model parameter; wi corresponds to the first model parameter applied to xi; <vi,vj> corresponds to the second model parameter).
Regarding Claim 17,
Yan et al. in view of Vijay et al. teaches the non-transitory computer-readable storage media of Claim 1.
Yan et al. further teaches wherein the target action is at least one of selecting an item, rating an item, downloading an item, or installing an item (pg. 2 Sections 3-3.1: “Our features are comprised of original features extracted directly from raw data and secondary features learned from the predictions of two classification models trained on the original features...Our task is to predict the probability of a user buying an item in the specified time. So we extract original features from three aspects including user, item and time” teaches predicting probability of the user performing the target action of buying an item (corresponds to selecting an item)).
Vijay et al. further teaches wherein the target action is at least one of selecting an item, rating an item, downloading an item, or installing an item (pg. 3 [0040] teaches the target action can be selecting or rating).
Yan et al. and Vijay et al. are analogous art to the claimed invention because they are directed to prediction models.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Vijay et al. to the disclosed invention of Yan et al.
One of ordinary skill in the arts would have been motivated to make this modification to “determine or predict what emails (or what content within emails) a member of an online social network service is likely to interact with. This information may be used by the email response prediction system to, for example, downshift/filter out certain emails to members if the email response prediction system determines that there's a low likelihood of a click on that email, which may reduce costs and member annoyance, while dramatically improving click through rate (CTR)” (Vijay et al. pg. 1 [0026]).
Regarding Claim 18,
Yan et al. in view of Vijay et al. teaches the non-transitory computer-readable storage media of Claim 1.
Yan et al. further teaches wherein the web service is at least one of a website or a software-as-a-service application (Table 1 and pg. 1 second full paragraph: “The training data provided by RecSys 2015 challenge [1] comprises a sequence of clicks and buy events performed by some user during a typical session on an e-commerce website. Test data contains click events only. Each click event contains session id (user), click time, item id and item’s category while each buy event contains session id, buy time, item id, item price and quantity of the item bought. Note that there is no other explicit identification for each user so we use session ids as user ids, i.e., we treat each session as belonging to a distinct user. There are a total of 33M clicks, 1.1M buys, 9M sessions and 53K items in the training data. The time span of these events is 6 months” teach the data are captured from a user accessing an e-commerce website).
Regarding Claim 19,
Claim 19 recites analogous limitations as claim 1 and is rejected based on the same rationale as claim 1.
Vijay et al. further teaches A system comprising: one or more hardware processors; one or more non-transitory computer readable storage media storing instructions which, when executed by one or more hardware processors, causes performance of a set of operations comprising (see pg. 15 [0104] & [0107]). 
Yan et al. and Vijay et al. are analogous art to the claimed invention because they are directed to prediction models.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Vijay et al. to the disclosed invention of Yan et al.
One of ordinary skill in the arts would have been motivated to make this modification to “determine or predict what emails (or what content within emails) a member of an online social network service is likely to interact with. This information may be used by the email response prediction system to, for example, downshift/filter out certain emails to members if the email response prediction system determines that there's a low likelihood of a click on that email, which may reduce costs and member annoyance, while dramatically improving click through rate (CTR)” (Vijay et al. pg. 1 [0026]).
Regarding Claim 20,
Claim 20 recites analogous limitations as claim 1 and is rejected based on the same rationale as claim 1.

Claims 2-3 are rejected under 35 U.S.C. 103 as being unpatentable over Yan et al. (“E-Commerce Item Recommendation Based on Field-aware Factorization Machine”) in view of Vijay et al. (US 2015/0381552 A1) and further in view of Chu et al. (“Personalized Recommendation on Dynamic Content Using Predictive Bilinear Models”).
Regarding Claim 2,
Yan et al. in view of Vijay et al. teaches the non-transitory computer-readable storage media of Claim 1.
Yan et al. in view of Vijay et al. does not appear to explicitly teach wherein the linear model component applies a set of weights based on relationships strengths between the features in contributing to a predicted event outcome.
However, Chu et al. teaches wherein the linear model component applies a set of weights based on relationships strengths between the features in contributing to a predicted event outcome (pg. 693 Section 3.1 Equation (1) and Section 3.1: “The weight variable wab is independent of user and content features and quantifies he affinity of these two factors xi,b and zj,a in interactions” and pg. 693 Section 3: “A set of weight coefficients is introduced to capture the pairwise associations between user and content features. The parametric model is optimized by fitting the observed interactive feedback” teach the linear model component of Equation (1) applying a set of weight coefficients that capture the linear affinity (relationship strengths) between user and content features; pg. 692 second full paragraph: “In this paper, we propose a machine learning approach to handling both issues in personalized recommendation. The key idea is to maintain profiles for both content and users, and build a feature-based bilinear regression model to quantify the associations between heterogeneous features by fitting the historical interactive data. The feature-based predictive model can then be applied to recommending new and existing items for both new and existing users” and pg. 696 Section 5.1: “To draw visitors’ attention, we would like to rank available articles according to visitors’ interests, and highlight the most attractive article at F1 position” teach the features contribute to a predicted event outcome in the form of predicting which article will be moved to the F1 position).
Yan et al., Vijay et al., and Chu et al. are analogous art to the claimed invention because they are directed to prediction models.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Chu et al. to the disclosed invention of Yan et al. in view of Vijay et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage “a feature-based bilinear regression framework for personalized recommendation on dynamic content [that] quantified associations between attributes in user profiles and content profiles through learning a parametric bilinear regression function from interactive feedback,” which “greatly alleviates the cold-start issue of recommending for new users, by leveraging interest patterns in user profiles recognized from regression over historical interactive feedback” (Chu et al. 699 Section 6).
Regarding Claim 3,
Yan et al. in view of Vijay et al. teaches the non-transitory computer-readable storage media of Claim 1.
Yan et al. in view of Vijay et al. does not appear to explicitly teach wherein the second set of model parameters include weights for pairwise interactions between features from a feature set including features from the plurality of users and the plurality of items; wherein a weight for a particular interaction between two features in the feature set is computed, at least in part, as a factor of pairwise interactions between other features in the feature set.
However, Chu et al. teaches wherein the second set of model parameters include weights for pairwise interactions between features from a feature set including features from the plurality of users and the plurality of items (pg. 693 Section 3.1 Equation (1) and Section 3.1: “The weight variable wab is independent of user and content features and quantifies he affinity of these two factors xi,b and zj,a in interactions” and pg. 693 Section 3: “A set of weight coefficients is introduced to capture the pairwise associations between user and content features. The parametric model is optimized by fitting the observed interactive feedback” teach a set of weight coefficients that capture the pairwise associations between user and content features characterizing the content items; pg. 693 first full paragraph: “Static descriptors: Such as categories, manufacturer name, title, bag of words of textual content etc” teaches content features characterizing the content items);
wherein a weight for a particular interaction between two features in the feature set is computed, at least in part, as a factor of pairwise interactions between other features in the feature set (pg. 693 Section 3.1: “
    PNG
    media_image8.png
    210
    477
    media_image8.png
    Greyscale
” teaches the weight for a particular interaction between two features is computer as a factor of interactions between all the features (including other features in the feature set, see variables C and D)).
Yan et al., Vijay et al., and Chu et al. are analogous art to the claimed invention because they are directed to prediction models.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Chu et al. to the disclosed invention of Yan et al. in view of Vijay et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage “a feature-based bilinear regression framework for personalized recommendation on dynamic content [that] quantified associations between attributes in user profiles and content profiles through learning a parametric bilinear regression function from interactive feedback,” which “greatly alleviates the cold-start issue of recommending for new users, by leveraging interest patterns in user profiles recognized from regression over historical interactive feedback” (Chu et al. 699 Section 6).

Claims 5 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Yan et al. (“E-Commerce Item Recommendation Based on Field-aware Factorization Machine”) in view of Vijay et al. (US 2015/0381552 A1) and further in view of Rendle et al. (“Factorization Machines with libFM”).


Regarding Claim 5,
Yan et al. in view of Vijay et al. teaches the non-transitory computer-readable storage media of Claim 1.
Yan et al. in view of Vijay et al. does not appear to explicitly teach wherein the respective probabilistic score is computed without building a full matrix of user and product features.
However, Rendle et al. teaches wherein the respective probabilistic score is computed without building a full matrix of user and product features (pg. 57:11 fourth full paragraph: “The MCMC Algorithm 3 solves regression tasks. It can be extended for binary classification by mapping the normal distributed yˆ to a probability b (yˆ)∈[0, 1] that defines the Bernoulli distribution for classification...That means, the MCMC algorithm will predict the probability that a case is of the positive class. LIBFM uses the CDF of a normal distribution” teaches computing probabilistic score; pg. 57:4 second to third full paragraphs: “Let W be any pairwise interaction matrix that should express the interactions between two distinct variables in an FM... because the advantage of FMs is the possibility to use a low-rank approximation of W, and thus, FMs can estimate interaction parameters even in highly sparse data—see Section 4.3 for a comparison to polynomial regression which uses the full matrix W for modeling interactions” teaches the factorization machine model uses a low-rank approximation of W (interaction matrix) instead of the full W matrix to perform the prediction; Fig. 1 teaches matrix representing interaction between user and product).
Yan et al., Vijay et al., and Rendle et al. are analogous art to the claimed invention because they are directed to prediction models.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Rendle et al. to the disclosed invention of Yan et al. in view of Vijay et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage “the advantage of FMs...to use a low-rank approximation of W”, which “allows FMs to estimate reliable parameters even in highly sparse data where standard models fail” and because “FMs combine the high-prediction accuracy of factorization models with the flexibility of feature engineering” (Rendle et al. pg. 57:3 first full paragraph & pg. 57:4 second to third full paragraphs & pg. 57:2 third full paragraph).
Regarding Claim 14,
Yan et al. in view of Vijay et al. teaches the non-transitory computer-readable storage media of Claim 13.
Yan et al. in view of Vijay et al. does not appear to explicitly teach wherein determining a respective probabilistic score for each item in the plurality of items is performed in O(kn+mk) time, where k is a size of the factorization vectors vi and Vj, n is a number of features associated with the user in the first feature vector, and m is a number of features associated with the plurality of items.
However, Rendle teaches wherein determining a respective probabilistic score for each item in the plurality of items is performed in O(kn+mk) time, where k is a size of the factorization vectors vi and Vj, n is a number of features associated with the user in the first feature vector, and m is a number of features associated with the plurality of items (pg. 57:7 first full paragraph: 
    PNG
    media_image9.png
    127
    927
    media_image9.png
    Greyscale
teaches performing the implementation of FMs in runtime of 
    PNG
    media_image10.png
    26
    114
    media_image10.png
    Greyscale
 wherein k corresponds to size of factorization vectors (see pg. 57:3 first paragraph: “where k is the dimensionality of the factorization” associated with the factorization vectors; see pg. 57:2 Equation (1)); 
pg. 57:3 second paragraph and Equation (3):
    PNG
    media_image11.png
    107
    884
    media_image11.png
    Greyscale
 teach that 
    PNG
    media_image12.png
    54
    74
    media_image12.png
    Greyscale
 corresponds to the number of nonzero elements in matrix X of Fig. 1, which includes the n number of features associated with the user and the m number of features associated with items, therefore runtime of 
    PNG
    media_image10.png
    26
    114
    media_image10.png
    Greyscale
 corresponds to O(k(n+m)), or in other words, in O(kn+mk); pg. 57:5 Section 3.1: teaches determining probabilistic scores).
Yan et al., Vijay et al., and Rendle et al. are analogous art to the claimed invention because they are directed to prediction models.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Rendle et al. to the disclosed invention of Yan et al. in view of Vijay et al.
One of ordinary skill in the arts would have been motivated to make this modification because “FMs combine the high-prediction accuracy of factorization models with the flexibility of feature engineering” (Rendle et al. pg. 57:2 third full paragraph).

Claims 8-9 are rejected under 35 U.S.C. 103 as being unpatentable over Yan et al. (“E-Commerce Item Recommendation Based on Field-aware Factorization Machine”) in view of Vijay et al. (US 2015/0381552 A1) and further in view of Ta et al. (“Factorization Machines with Follow-The-Regularized-Leader for CTR prediction in Display Advertising”).
Regarding Claim 8,
Yan et al. in view of Vijay et al. teaches the non-transitory computer-readable storage media of Claim 1.
Yan et al. in view of Vijay et al. does not appear to explicitly teach wherein the first set of model parameters and the second set of model parameters are generated based at least in part on a function that minimizes a log-loss between different observations in the training data.
However, Ta et al. teaches wherein the first set of model parameters and the second set of model parameters are generated based at least in part on a function that minimizes a log-loss between different observations in the training data (pg. 2890 Section IIB: “
    PNG
    media_image13.png
    490
    502
    media_image13.png
    Greyscale
” teaches generating model parameters based on a function that minimizes a log-loss between different observations; pg. 2890 last full paragraph: “yielding a model with 4M × k parameters, where k is the dimensionality of the factorization” and pg. 2891 first full paragraph: “For the number of latent factors, k = 20 is used in all experiments” teaches at least a first and second sets of model parameters).
Yan et al., Vijay et al., and Ta et al. are analogous art to the claimed invention because they are directed to prediction models.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Ta et al. to the disclosed invention of Yan et al. in view of Vijay et al.
One of ordinary skill in the arts would have been motivated to make this modification because “the FTRL-Proximal algorithm with per-coordinate learning rate to FM...algorithm produces a sparse model, making it applicable to real-world scenarios (i.e., in production, one can store only the non-zero coefficients of the model). Experimental results show that the FTRFL method outperforms the standard FM with SGD, and has a much faster rate of convergence” (Ta et al. pg. 2891 Section IV).
Regarding Claim 9,
Yan et al. in view of Vijay et al. teaches the non-transitory computer-readable storage media of Claim 1.
Yan et al. in view of Vijay et al. does not appear to explicitly teach wherein the function is a Follow-the-Regularized-Leader (FTRL) function.
However, Ta et al. teaches wherein the function is a Follow-the-Regularized-Leader (FTRL) function (pg. 2889 last full paragraph: “In this paper, we attempt to get both the sparsity provided by FTRL-Proximal and the ability of estimating higher order information of FM. To this end, we present the Follow-The-Regularized-Factorized-Leader (FTRFL) algorithm, which incorporates the FTRLproximal with per-coordinate learning rates into FM” teaches a Follow-the-Regularized-Leader (FTRL) function).
Yan et al., Vijay et al., and Ta et al. are analogous art to the claimed invention because they are directed to prediction models.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Ta et al. to the disclosed invention of Yan et al. in view of Vijay et al.
One of ordinary skill in the arts would have been motivated to make this modification because “the FTRL-Proximal algorithm with per-coordinate learning rate to FM...algorithm produces a sparse model, making it applicable to real-world scenarios (i.e., in production, one can store only the non-zero coefficients of the model). Experimental results show that the FTRFL method outperforms the standard FM with SGD, and has a much faster rate of convergence” (Ta et al. pg. 2891 Section IV).

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Yan et al. (“E-Commerce Item Recommendation Based on Field-aware Factorization Machine”) in view of Vijay et al. (US 2015/0381552 A1) and further in view of Jiang et al. (“Scaling-up Item-based Collaborative Filtering Recommendation Algorithm based on Hadoop”).
Regarding Claim 15,
Yan et al. in view of Vijay et al. teaches the non-transitory computer-readable storage media of Claim 1.
Yan et al. further teaches wherein generating the first feature vector based on a first set of features associated with a user (pg. 3 Section 4: “
    PNG
    media_image4.png
    76
    564
    media_image4.png
    Greyscale
“ teach generating a feature vector                         
                            x
                        
                     based on first set of features associated with a user that is currently accessing an e-commerce website; Table 1 teaches a set of features associated with users).
Yan et al. in view of Vijay et al. does not appear to explicitly teach comprises applying a hash function to key-value pairs of user attributes to map the features to indices in the first feature vector.
However, Jiang et al. teaches comprises applying a hash function to key-value pairs of user attributes to map the features to indices in the first feature vector (Figure 1 and pg. 491 Section IIA: “The input of the computation is a set of (key, value) pairs and the output of the computation is also a set of (key, value) pairs. We define these (key, value) pairs using angle brackets <k, v> on both phase...Figure 1 shows the MapReduce computation framework. The computation starts from a map phase in which the map functions are executed in parallel with various splits of the input data which are stored in a distributed file system (DFS). Processing each split is assigned to one map task. The output pairs of each map function are hash-partitioned on the intermediate key. Each partition are sorted and then merged in the sorted order by their keys. All the partitions which shared the same key are sent to a single reduce task in which the reducer function obtains the final results” teach applying hash-partitioning (thus rendering a hash function or algorithm is applied) to key-value pairs to map input features to keys (indices) in a vector; pg. 495 Section IV A: “we use the MoveLens [19-21] datasets which consists of movie rating data collected using a web-based research recommendation system. The datasets contains 943 users, 1670 movies item and about 54,000 ratings on a scale from 1 to 5” teaches the inputs are about user attributes).
Yan et al., Vijay et al., and Jiang et al. are analogous art to the claimed invention because they are directed to prediction models.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Jiang et al. to the disclosed invention of Yan et al. in view of Vijay et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following benefit: “To minimize the communication cost, we used effective partition strategies to realize the local computation in each Map-Reduce phase” (Jiang et al. pg. 497 first paragraph).

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Yan et al. (“E-Commerce Item Recommendation Based on Field-aware Factorization Machine”) in view of Vijay et al. (US 2015/0381552 A1) and further in view of Morris et al. (US 2017/0186029 A1).
Regarding Claim 16,
Yan et al. in view of Vijay et al. teaches the non-transitory computer-readable storage media of Claim 1.
Yan et al. in view of Vijay et al. does not appear to explicitly teach wherein each item includes at least one embedder uniform resource locator that, when selected, results in performance of the target action.
However, Morris et al. teaches wherein each item includes at least one embedder uniform resource locator that, when selected, results in performance of the target action (pg. 3 [0022]: “An ad 131 can be presented to a user by presenting the ad content 132. The ad content 132 specifies what to present to the user, which can include text, a video, a sound file, an image, and metadata specifying how to display the various components, among other things. The ad content 132 may also include a network address, such as a URL or an address of a page on the social networking service 110, which links to additional content to present. The linked content at the network address can be accessed by the user by clicking, tapping, or otherwise interacting with a portion of the ad 131” teaches a URL can be embedded on the ad such that when the URL is selected, the result is the performance of the target action of clicking).
Yan et al., Vijay et al. and Morris et al. are analogous art to the claimed invention because they are directed to prediction model.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Morris et al. to the disclosed invention of Yan et al. in view of Vijay et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “display a plurality of ads to users and log user feedback and interaction with each ad. The feedback and user engagement information of each ad can be compared to the user interaction history of other ads in order to generate a relevance score that is indicative of the ad's user engagement” to address the challenge of measuring advertisement effectiveness (Morris et al. pg. 1 [0001]- [0003]).

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Yan et al. (“E-Commerce Item Recommendation Based on Field-aware Factorization Machine”) in view of Vijay et al. (US 2015/0381552 A1) and further in view of GOLBANDI et al. (US 2016/0371589 A1).
Regarding Claim 10,
Yan et al. in view of Vijay et al. teaches the non-transitory computer-readable storage media of Claim 1.
Yan et al. further teaches the operations further comprising updating the linear prediction component and the nonlinear prediction component based on additional actions between the plurality of users and the plurality of items (pg. 4 first full paragraph: “
    PNG
    media_image1.png
    270
    562
    media_image1.png
    Greyscale
” teaches updating a linear component as modeled by 
    PNG
    media_image2.png
    65
    80
    media_image2.png
    Greyscale
 and a nonlinear component as modeled by 
    PNG
    media_image3.png
    72
    306
    media_image3.png
    Greyscale
 based on additional interactions (correspond to additional actions) between feature vectors                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     and                         
                            
                                
                                    x
                                
                                
                                    j
                                
                            
                        
                     (each can represent either user or item features as shown in Table 1) in multiple iterations).
Yan et al. in view of Vijay et al. does not appear to explicitly teach wherein the additional actions are received subsequent to receiving the training data.
However, GOLBANDI et al. teaches wherein the additional actions are received subsequent to receiving the training data (Fig. 5 and pg. 6 [0054]: “Through the user feedback pipeline, a user feedback may be sent to the sample buffer 514 individually. Alternatively, information from the user feedback pipeline may be organized as packet of aggregated user feedbacks. For example, the user feedback may be sent to a feedback aggregator 530. The feedback aggregator 530 may collect a plurality of user feedbacks and process the user feedbacks to obtain feedback statistics, such as measuring a probability that an article is clicked and read by a user. Then the feedback aggregator 530 may send the user feedbacks as well as the feedback statistics to the sample buffer 514. The packet of the user feedbacks may include a label, which may be a variable that represents the user feedback for the article. The packet may also contain information to compute the request ID of the corresponding user request. With the hashing mechanism, the sample buffer 514 may match the user feedback with the corresponding user request and request features by looking up the index with the same request and item ID” teach receiving user feedback (additional actions) between the user and the recommended item, which is subsequent to receiving training data in the previous iteration that caused the model to provide a recommendation for which feedback is received (see Fig. 5)).
Yan et al., Vijay et al., and GOLBANDI et al. are analogous art to the claimed invention because they are directed to prediction model.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by GOLBANDI et al. to the disclosed invention of Yan et al. in view of Vijay et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following: “The online phase-serving system also includes a sample buffer that joins together as training samples request features and feedbacks related to a user...the online content recommendation phase-serving systems and methods may support recommendation in scale in an online manner, in which the model adapted therein is updated in real-time and with minimal communication cost, and utilizes a simplified architecture that saves hardware resources of a client device and a computer system” (GOLBANDI et al. pg. 1 [0006]).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Yan et al. (“E-Commerce Item Recommendation Based on Field-aware Factorization Machine”) in view of Vijay et al. (US 2015/0381552 A1) in view of GOLBANDI et al. (US 2016/0371589 A1) and further in view of Ta et al. (“Factorization Machines with Follow-The-Regularized-Leader for CTR prediction in Display Advertising”).
Regarding Claim 11,
Yan et al. in view of Vijay et al. in view of GOLBANDI et al. teaches the non-transitory computer-readable storage media of Claim 10.
Yan et al. in view of Vijay et al. in view of GOLBANDI et al. does not appear to explicitly teach wherein different learning rates are applied to different model parameters while updating the linear prediction component and the nonlinear prediction component.
However, Ta et al. teaches wherein different learning rates are applied to different model parameters while updating the linear prediction component and the nonlinear prediction component (pg. 2890 Section IIB:“Several learning algorithms, e.g., SGD, have been proposed [11] to solve equation 2. In this work, the per-coordinate FTRL-Proximal algorithm was employed to induce sparsity and yield better performance. In particular, at each step, we update the weight vector on a per-coordinate basis, where the learning rate for each latent factor vi,f at iteration t is set to

    PNG
    media_image14.png
    99
    482
    media_image14.png
    Greyscale
” teaches different learning rates are applied for different latent factors (parameters) while solving equation (2), which includes a linear and nonlinear prediction component; also see equation (1)).
Yan et al., Vijay et al., GOLBANDI et al., and Ta et al. are analogous art to the claimed invention because they are directed to prediction model.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Ta et al. to the disclosed invention of Yan et al. in view of Vijay et al. in view of GOLBANDI et al.
One of ordinary skill in the arts would have been motivated to make this modification because “the FTRL-Proximal algorithm with per-coordinate learning rate to FM...algorithm produces a sparse model, making it applicable to real-world scenarios (i.e., in production, one can store only the non-zero coefficients of the model). Experimental results show that the FTRFL method outperforms the standard FM with SGD, and has a much faster rate of convergence” (Ta et al. pg. 2891 Section IV).

Response to Arguments
Applicant’s arguments filed on 03/07/2022 with respect to the 35 U.S.C. 112(b) rejection of claims 3-4 and 13-14 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made to claims 3-4 and 13-14 necessitated by amendments submitted on 03/07/2022. Please see the current Office Action for more information.

Applicant's arguments filed on 03/07/2022 with respect to the prior art rejection of amended independent claims 1, 19, and 20 have been fully considered but they are not persuasive. 
Applicant asserts that “at least the limitations shown above in bold are not taught or disclosed by the cited references, either individually or in combination. As highlighted above, claim 1 currently recites training a prediction model, "wherein training the prediction model includes generating, for the prediction model, (a) a linear prediction component modelling linear relationships, learned from the training data, and (b) a nonlinear prediction component modelling nonlinear interactions, learned from the training data, between different features in predicting the set of actions." These features were somewhat similar, although not exact, to features previously recited in claim 7” and that specifically, the previously cited Vijay and Oentaryo references do not teach the bolded limitations of amended claim 1 as indicated in pg. 17-18 of the Remarks (Remarks, pg. 17-19).
Examiner’s Response:
The Examiner respectfully disagrees. First, Examiners notes that Vijay and Oentaryo have not been relied upon to teach the bolded limitations of amended claim 1 as indicated in pg. 17-18 of the Remarks. Necessitated by amendments, a new ground of rejection, Yan et al. (“E-Commerce Item Recommendation Based on Field-aware Factorization Machine”) in view of Vijay et al. (US 2015/0381552 A1), has been applied to independent claims 1, 19, and 20. Applicant asserts that “at least the limitations shown above in bold are not taught or disclosed by the cited references, either individually or in combination” (Remarks, pg. 18), but has not provided any arguments regarding the Yan et al. reference. Therefore, Applicant's arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references.

Applicant asserts that “[t]he cited references further fail to disclose or suggest determining a respective probabilistic score of the user performing a target action for each item of a plurality of items, wherein the respective probabilistic score for a respective item is computed as a function of (a) the linear prediction component, (b) the nonlinear prediction component, (c) the first feature vector, and (d) the respective feature vector for the respective item. As noted by the Office Action, Vijay does not explicitly disclose a linear and nonlinear component. As such, Vijay also fails to teach or suggest generating a prediction as a function of both the linear and nonlinear prediction components. The other cited references fail to cure the deficiencies of Vijay with respect to claim 1. The other references do not generate a prediction that a user will perform a target action/or each item based on two separate feature vectors and two separate components, including a linear and nonlinear component. Therefore, this aspect of claim I is also different than the cited references” (Remarks, pg. 19-20).
Examiner’s Response:
Applicant's arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references. Please see the current Office Action regarding how Yan et al. (“E-Commerce Item Recommendation Based on Field-aware Factorization Machine”) in view of Vijay et al. (US 2015/0381552 A1) teaches amended independent claims 1, 19, and 20.
Applicant relies on the above arguments for dependent claims 2-18, therefore the above responses are applicable to dependent claims 2-18.

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Rossi et al. (US 10,235,403 B2) teaches performing matrix factorization to obtain lower-dimension matrices defining a latent feature model, which is relevant to Fig. 2 of the present application.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YING YU CHEN whose telephone number is (571)270-1484. The examiner can normally be reached Monday-Friday 7:30 am-5:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YING YU CHEN/               Examiner, Art Unit 2125