DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application was filed on January 21, 2020. 
This office action is in response to Amendments and/or remarks filed on December 15, 2021. In the current amendment, claims 1, 4, 7, 8, 11, 14, 20 are amended. Claims 5-6, 12-13, and 18-19 are cancelled. Claims 1-4, 7-11, 14-17, and 20 are pending. 
In response to amendments and/or remarks filed on December 15, 2021, the objections to the drawings made in the previous office action has been withdrawn.
In response to amendments and/or remarks filed on December 15, 2021, the double patenting rejection applied to claims 1 and 14 made in the previous office action has been withdrawn.
In response to amendments and/or remarks filed on December 15, 2021, the 35 USC 112(f) interpretation of claims 1, 4, 5, 7, 11, and 12 made in the previous office action has been withdrawn.
In response to amendments and/or remarks filed on December 15, 2021, the 35 USC 112(a) rejection applied to claims 1-7 and 11-13 made in the previous office action has been withdrawn.
In response to amendments and/or remarks filed on December 15, 2021, the 35 USC 112(b) rejection applied to claims 1-7 and 11-13 made in the previous office action has been withdrawn.
In response to amendments and/or remarks filed on December 15, 2021, the 35 USC 102 rejection applied to claims 1-3, 8, and 14-16 made in the previous office action has been withdrawn.

Information Disclosure Statement
The information disclosure statement(s) (IDS) was/were submitted on December 22, 2021. The submission(s) are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement(s) are being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 7-11, 14-17, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US 2018/0121964 A1) in view of Du et al. (US 2020/0033144 A1), further in view of Jiang et al. (US 2018/0174038 A1). 

Regarding Claim 1, 
Zhang teaches: 
A system for generating item recommendations, comprising a memory having instructions stored thereon, and a processor that reads instructions to: (Para [0006]: “An online system receives multiple candidate content item components ("candidate components") of at least one type (e.g., title, image, body text, call to action, video, etc.) from a content-providing user of the online system (e.g., an advertiser) for including in a content item to be presented to viewing users of the online system.” teaches a system for content selection and presentation; Para [0118]: “Embodiments also may relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, and/or it may comprise a general-purpose computing device selectively activated or reconfigured by a computer program stored in the computer.” teaches a computing device that contains a memory and processor)
receive a plurality of content elements for presentation in at least one content container; (Para [0008]: “The optimal content item is included in a content selection process (e.g., an auction) that selects one or more content items for presentation to the subject user. For example, the online system includes an optimal advertisement in an advertisement auction that ranks the optimal advertisement among one or more additional advertisements based on a bid amount associated with each advertisement and selects a highest ranked advertisement for presentation to the subject user.” teaches receiving a plurality of content elements that can be selected for presentation as an optimal content item (content container); Fig. 5 and Para [0007]: “Upon identifying an opportunity to present a content item to a subject user of the online system (i.e., an "impression" opportunity), the online system dynamically generates an optimal content item ( e.g., an optimal advertisement) for presentation to the subject user using one or more of the candidate components.” teaches that the optimal content item is a content container because it contains the selected content items)
select one of the plurality of content elements for presentation in the at least one content container, (Fig. 5 and Para [0008]: “The optimal content item is included in a content selection process (e.g., an auction) that selects one or more content items for presentation to the subject user.” teaches selecting one of a plurality of content items to be used in an optimal content item (content container) which is presented to the user) 
Para [0009]: “The online system selects components to include in the optimal content item to be presented to the subject user based on an affinity score of the subject user predicted for each candidate component, in which an affinity score for a candidate component indicates the subject user's predicted affinity for the candidate component. For example, the online system predicts affinity scores of the subject user for candidate components and selects the candidate components that are associated with the highest affinity scores for inclusion in the optimal content item ( e.g., by ranking multiple candidate components of various types based on their affinity scores and selecting the highest ranked candidate component of each type).” teaches that content items are selected as the optimal content item based on an affinity score of the user with the respective content item (candidate component); Para [0013]: “In some embodiments, the affinity score of the subject user for a candidate component may be predicted using a machine-learned model. The online system may train the machine-learned model to predict an affinity score of the subject user for a candidate component using affinity scores of viewing users of the online system for the candidate component, in which the viewing users have at least a threshold measure of similarity to the subject user (e.g., based on attributes shared by the subject user and the viewing users). For example, the online system trains the machine-learned model using a set of affinity scores of viewing users of the online system for each candidate component included in "training content items" presented to the viewing users and information describing the ages and genders of the viewing users.” teaches using a trained machine learning model to predict affinity scores of content items to be potentially used for selection and presentation to the user; Para [0017]: “In addition to random selection, the historical performance information also may be associated with training content items generated from candidate components that are selected using a heuristic (e.g., Thompson sampling). For example, once the training content items that include randomly selected candidate components have achieved at least 1,000 impressions, the online system generates training content items that include candidate components that are selected using Thompson sampling. The online system may use Thompson sampling to select each candidate component to include in a training content item based on a distribution of affinity scores for each candidate component, in which the distribution of affinity scores for a candidate component is inversely proportional to the amount of data for the component (i.e., the number of impressions achieved by training content items including the component).” teaches that the machine learning model uses Thompson sampling for selecting content items)
generate an interface including the selected one of the plurality of content elements. (Para [0008]: “The online system may then present the selected content item to the subject user (e.g., in a display area of a client device associated with the subject user).” teaches presenting the selected content item to a user through a display (generating an interface on the display for the user to view the selected content item))

Zhang does not appear to explicitly teach: 
wherein the trained selection model calculates one or more posterior distribution parameters of a total reward value Q, and  wherein the Thompson sampling is applied to the one or more posterior distribution parameters calculated using a short-term reward value, r, and a long term reward value, R; and

However, Du teaches: 
wherein the trained selection model calculates one or more posterior distribution parameters of a total reward value Q, and wherein the Thompson sampling is applied to the one or more posterior distribution parameters… (Para [0062]: “The event sequence recommender system 106 implements Thompson Sampling by sampling, in each round, a parameter θ* from the posterior P(θ|D), and choosing the action a* that maximizes IE [rlX a*, θ*] (i.e., the expected reward given the parameter, the action, and the current context).” teaches calculating and sampling a parameter θ*, from the posterior distribution P(θ|D) by using Thompson Sampling, that the parameter is used to calculate an expected reward for choosing an action, and maximizing the calculated expected reward by choosing an appropriate action given the parameter and current context)
Zhang and Du are analogous art because they are directed to recommendation systems that use Thompson Sampling. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Du’s dynamic user preference interface into Zhang’s system for presenting optimal content items with a motivation to “…use the previously trained recommendation model in conjunction with the modified reward function to quickly generate modified recommendations” (Du, Para [0032]).)

The combination of Zhang and Du does not appear to explicitly teach: 
[wherein the one or more posterior distribution parameters are] …calculated using a short-term reward value, r, and a long term reward value, R.

However, Jiang teaches: 
[wherein the one or more posterior distribution parameters are] …calculated using a short-term reward value, r, and a long term reward value, R. (Para [0068]: “Target yj is a computed target Q-value after taking an optimal action at time stamp j. It is computed as the current reward plus an estimated optimal Q-value after observing the new sensing frame Xj+1 determined by the current Q-network Nk-I with parameters θk. The parameter n is the forgetting factor valued between 0 and 1 and determines how important the system weights long-term rewards against short-term ones. The smaller the forgetting factor, the robotic device weights less on long-term rewards but cares only for the short-term rewards. If the forgetting factor is closer to 1, the robotic device tends to treat long-term rewards similarly with the short-term rewards.” teaches that θk, the posterior distribution parameter, is calculated based on the forgetting factor, a parameter that is calculated based on the short and long-term rewards. θk is a posterior distribution parameter because the parameter is calculated after observing the new sensing frame (parameter is assigned after the sensing frame (event) has occurred))
Zhang, Du, and Jiang are analogous art because they are directed to systems that use machine learning models. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Jiang’s short-term and long-term rewards into Zhang’s system for presenting optimal content items as modified by Du with a motivation to “…finds an optimal policy, in the sense that the expected value of the total reward return over all successive steps, starting from the current state, is the maximum achievable” (Jiang, Para [0039]).

Regarding Claim 2, 
The combination of Zhang, Du, and Jiang teaches The system of claim 1,
Zhang further teaches: 
wherein the plurality of content elements are selected based on a received persona. (Para [0069]: “The content selection module 255 selects (e.g., as shown in step 365 of FIG. 3) one or more content items for communication to a client device 110 to be presented to a user. Content items eligible for presentation to the user are retrieved from the content store 210, from the ad request store 230, or from another source by the content selection module 255, which selects one or more of the content items for presentation to the user. A content item eligible for presentation to the user is a content item associated with at least a threshold number of targeting criteria satisfied by characteristics of the user or is a content item that is not associated with targeting criteria.” teaches selecting potential content items to present to the user based on targeting criteria; Para [0004]: “For example, targeting criteria are used to identify users associated with specific user profile information satisfying at least one of the targeting criteria. Attributes specified by targeting criteria are usually associated with online system users who are likely to have an interest in content items associated with the targeting criteria or who are likely to find such content items relevant. For example, content items associated with the board game chess may be associated with targeting criteria describing online system users who have expressed an interest in board games (e.g., users who have included playing board games as a hobby in their profile information, users who have downloaded game applications for board games in the online system, etc).” teaches that the targeting criteria is based on a user’s characteristics and actions (persona))

Regarding Claim 3, 
The combination of Zhang, Du, and Jiang teaches The system of claim 1,
Zhang further teaches:
wherein the trained selection model is trained using a plurality of prior impressions. (Para [0016]: “In some embodiments, the historical performance information used to train the machine-learned model is associated with training content items generated from randomly selected candidate components, in which the training content items have achieved at least a threshold number of impressions (e.g., 1,000 impressions). For example, if the content-providing user provides 13 different candidate image components to the online system, the online system randomly selects one of the candidate image components to include in a training content item that is presented to a viewing user of the online system and repeats this process until at least a threshold number of impressions have been achieved for each candidate image component. In this example, performance information associated with each impression of the training content items is used to train the machine-learned model.” teaches selecting a candidate component for training the model based on the plurality of prior impressions and whether or not the training content item generates a threshold number of impressions)

Regarding Claim 4, 
The combination of Zhang, Du, and Jiang teaches The system of claim 1,
Du further teaches: 
wherein the trained selection model implements a state-action-reward-state-action (SARSA) process modified to use Thompson sampling. (Para [0061 and 0062]: “After constructing the Markov Decision Process models 310, the event sequence recommender system 106 uses Thompson Sampling, which chooses actions in real time to maximize the expected experience as calculated by the reward on each state. In particular, through Thompson Sampling, the event sequence recommender system 106 recommends actions based on their probability of maximizing the expected reward as shown below:

    PNG
    media_image1.png
    58
    686
    media_image1.png
    Greyscale
In the equation above, X represents the current context and 'D ={ (X; a; r)} represents past observations of contexts, actions, and rewards.” teaches a selection model that implements a state-action-reward-state-action process modified to use Thompson sampling)
Zhang, Du, and Jiang are analogous art because they are directed to systems that use machine learning models.  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Du’s dynamic user preference interface into Zhang’s system for presenting optimal content items as modified by Jiang with a motivation to 


Regarding Claim 7, 
The combination of Zhang, Du, and Jiang teaches The system of claim 1,
Du further teaches: 
wherein the processor further reads the instructions to: identify a state and an action taken through the interface including the selected one of the plurality of content elements; (Para [0060]: “Using policy iteration, the event sequence recommender system 106 can determine the optimal policies and value function Vθ *(x) corresponding to each of the Markov Decision Process models 310. In particular, a policy includes a function that specifies the action a user will take when in a particular state of the model.” teaches identifying a state and an action of the model; Para [0104]: “Subsequently, the event sequence recommender system 106 can provide, for display via a client device, a user interface that displays the recommended sequence of digital content transmissions, the plurality of historical digital content transmissions, and a plurality of interactive elements for entry of user preferences. In one or more embodiments, a user preference can include a distribution channel through which to transmit the digital content (e.g., email, multimedia messaging, social media post, etc.), a preferred digital content category ( e.g., video advertisement, digital image, informative literature etc.), or a preferred digital content item to transmit ( e.g., a particular advertisement or piece of informative literature).” teaches that an action of the event sequence recommender can include providing display of preferred (selected) content elements through a user interface)
receive an updated trained selection model having a reward function updated based on the state and the action. (Para [0061] and [0062]: “After constructing the Markov Decision Process models 310, the event sequence recommender system 106 uses Thompson Sampling, which chooses actions in real time to maximize the expected experience as calculated by the reward on each state. In particular, through Thompson Sampling, the event sequence recommender system 106 recommends actions based on their probability of maximizing the expected reward as shown below: 

    PNG
    media_image1.png
    58
    686
    media_image1.png
    Greyscale

…The event sequence recommender system 106 implements Thompson Sampling by sampling, in each round, a parameter θ* from the posterior P(θ|D), and choosing the action a* that maximizes IE [rlX a*, θ*] (i.e., the expected reward given the parameter, the action, and the current context).” 
teaches that the event sequence recommender uses Thompson sampling with a reward function that updates the rewards based on a chosen action that maximizes the expected reward, given the action and current context (state))

Zhang, Du, and Jiang are analogous art because they are directed to systems that use machine learning models.  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Du’s dynamic user preference interface into Zhang’s system for presenting optimal content items as modified by Jiang with a motivation to “…use the previously trained recommendation model in conjunction with the modified reward function to quickly generate modified recommendations” (Du, Para [0032]).

Regarding Claim 8, 
Zhang teaches:
Para [0117]: “In one embodiment, a software module is implemented with a computer program product comprising a computer-readable medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.” teaches a computer readable medium containing instructions that can be executed by the processor)
receiving a request for an interface, wherein the request includes a user persona; (Para [0007]: “Upon identifying an opportunity to present a content item to a subject user of the online system (i.e., an "impression" opportunity), the online system dynamically generates an optimal content item (e.g., an optimal advertisement) for presentation to the subject user using one or more of the candidate components.” teaches receiving an opportunity to present content to a user (request for an interface) and generating an optimal content item to present to the user; Para [0069]: “The content selection module 255 selects (e.g., as shown in step 365 of FIG. 3) one or more content items for communication to a client device 110 to be presented to a user. Content items eligible for presentation to the user are retrieved from the content store 210, from the ad request store 230, or from another source by the content selection module 255, which selects one or more of the content items for presentation to the user. A content item eligible for presentation to the user is a content item associated with at least a threshold number of targeting criteria satisfied by characteristics of the user or is a content item that is not associated with targeting criteria.” teaches presenting content items to the user based on targeting criteria; Para [0004]: “For example, targeting criteria are used to identify users associated with specific user profile information satisfying at least one of the targeting criteria. Attributes specified by targeting criteria are usually associated with online system users who are likely to have an interest in content items associated with the targeting criteria or who are likely to find such content items relevant. For example, content items associated with the board game chess may be associated with targeting criteria describing online system users who have expressed an interest in board games (e.g., users who have included playing board games as a hobby in their profile information, users who have downloaded game applications for board games in the online system, etc).” teaches that the targeting criteria is based on a user’s characteristics and actions (persona))
selecting at least one of a plurality of content elements for inclusion in the interface, wherein the at least one of the plurality of content elements is selected using Thompson sampling; and (Fig. 5 and Para [0008]: “The optimal content item is included in a content selection process (e.g., an auction) that selects one or more content items for presentation to the subject user.” teaches selecting one of a plurality of content items to be used in an optimal content item (content container) which is presented to the user; Para [0017]: “In addition to random selection, the historical performance information also may be associated with training content items generated from candidate components that are selected using a heuristic (e.g., Thompson sampling). For example, once the training content items that include randomly selected candidate components have achieved at least 1,000 impressions, the online system generates training content items that include candidate components that are selected using Thompson sampling. The online system may use Thompson sampling to select each candidate component to include in a training content item based on a distribution of affinity scores for each candidate component, in which the distribution of affinity scores for a candidate component is inversely proportional to the amount of data for the component (i.e., the number of impressions achieved by training content items including the component).” teaches using Thompson sampling to select content elements)
generating an interface including the selected at least one of the plurality of content elements. (Para [0008]: “The online system may then present the selected content item to the subject user (e.g., in a display area of a client device associated with the subject user).” teaches presenting the selected 

Zhang does not appear to explicitly teach: 
wherein the trained selection model calculates one or more posterior distribution parameters of a total reward value Q, and  wherein the Thompson sampling is applied to the one or more posterior distribution parameters calculated using a short-term reward value, r, and a long term reward value, R; and

However, Du teaches: 
wherein the trained selection model calculates one or more posterior distribution parameters of a total reward value Q, and wherein the Thompson sampling is applied to the one or more posterior distribution parameters… (Para [0062]: “The event sequence recommender system 106 implements Thompson Sampling by sampling, in each round, a parameter θ* from the posterior P(θ|D), and choosing the action a* that maximizes IE [rlX a*, θ*] (i.e., the expected reward given the parameter, the action, and the current context).” teaches calculating and sampling a parameter θ*, from the posterior distribution P(θ|D) by using Thompson Sampling, that the parameter is used to calculate an expected reward for choosing an action, and maximizing the calculated expected reward by choosing an appropriate action given the parameter and current context)
Zhang and Du are analogous art because they are directed to recommendation systems that use Thompson Sampling. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Du’s dynamic user preference interface into Zhang’s system for presenting optimal content items with a motivation to “…use the previously 

The combination of Zhang and Du does not appear to explicitly teach: 
[wherein the one or more posterior distribution parameters are] …calculated using a short-term reward value, r, and a long term reward value, R.

However, Jiang teaches: 
[wherein the one or more posterior distribution parameters are] …calculated using a short-term reward value, r, and a long term reward value, R. (Para [0068]: “Target yj is a computed target Q-value after taking an optimal action at time stamp j. It is computed as the current reward plus an estimated optimal Q-value after observing the new sensing frame Xj+1 determined by the current Q-network Nk-I with parameters θk. The parameter n is the forgetting factor valued between 0 and 1 and determines how important the system weights long-term rewards against short-term ones. The smaller the forgetting factor, the robotic device weights less on long-term rewards but cares only for the short-term rewards. If the forgetting factor is closer to 1, the robotic device tends to treat long-term rewards similarly with the short-term rewards.” teaches that θk, the posterior distribution parameter, is calculated based on the forgetting factor, a parameter that is calculated based on the short and long-term rewards. θk is a posterior distribution parameter because the parameter is calculated after observing the new sensing frame (parameter is assigned after the sensing frame (event) has occurred))
Zhang, Du, and Jiang are analogous art because they are directed to systems that use machine learning models. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Jiang’s short-term and long-term 


Regarding Claim 9, 
The combination of Zhang, Du, and Jiang teaches The non-transitory computer readable medium of claim 8,
Du further teaches: 
wherein the Thompson sampling is implemented by a machine learning model trained using reinforcement learning. (Para [0061]: “After constructing the Markov Decision Process models 310, the event sequence recommender system 106 uses Thompson Sampling, which chooses actions in real time to maximize the expected experience as calculated by the reward on each state.” teaches that the Markov Decision Process models (machine learning models trained using reinforcement learning) uses Thompson sampling)
Zhang, Du, and Jiang are analogous art because they are directed to systems that use machine learning models. 
 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Du’s dynamic user preference interface into Zhang’s system for presenting optimal content items as modified by Jiang with a motivation to “…use the previously trained recommendation model in conjunction with the modified reward function to quickly generate modified recommendations” (Du, Para [0032]).

Regarding Claim 10, 
The combination of Zhang, Du, and Jiang teaches The non-transitory computer readable medium of claim 9,
Zhang further teaches:
wherein the machine learning model is trained using a plurality of prior impressions. (Para [0016]: “In some embodiments, the historical performance information used to train the machine-learned model is associated with training content items generated from randomly selected candidate components, in which the training content items have achieved at least a threshold number of impressions (e.g., 1,000 impressions). For example, if the content-providing user provides 13 different candidate image components to the online system, the online system randomly selects one of the candidate image components to include in a training content item that is presented to a viewing user of the online system and repeats this process until at least a threshold number of impressions have been achieved for each candidate image component. In this example, performance information associated with each impression of the training content items is used to train the machine-learned model.” teaches selecting a candidate component for training the model based on the plurality of prior impressions and whether or not the training content item generates a threshold number of impressions)

Regarding Claim 11, 
The combination of Zhang, Du, and Jiang teaches The non-transitory computer readable medium of claim 9,
Du further teaches: 
wherein the machine learning model implements a state-action-reward-state-action (SARSA) process modified to use Thompson sampling. (Para [0061 and 0062]: “After constructing the Markov Decision Process models 310, the event sequence recommender system 106 uses Thompson Sampling, which chooses actions in real time to maximize the expected experience as calculated by the reward on each state. In particular, through Thompson Sampling, the event sequence recommender system 106 recommends actions based on their probability of maximizing the expected reward as shown below:

    PNG
    media_image1.png
    58
    686
    media_image1.png
    Greyscale
In the equation above, X represents the current context and 'D ={ (X; a; r)} represents past observations of contexts, actions, and rewards.” teaches a selection model that implements a state-action-reward-state-action process modified to use Thompson sampling)
Zhang, Du, and Jiang are analogous art because they are directed to systems that use machine learning models.  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Du’s dynamic user preference interface into Zhang’s system for presenting optimal content items as modified by Jiang with a motivation to “…use the previously trained recommendation model in conjunction with the modified reward function to quickly generate modified recommendations” (Du, Para [0032]).

Regarding Claim 14, 
This claim recites A computer-implemented method, which performs a plurality of operations as recited by the system of claim 1, and has limitations that are similar to those of claim 1, thus is rejected with the same rationale applied against claim 1. 
Regarding Claim 15, 

Regarding Claim 16, 
This claim recites The computer-implemented method of claim 14, which performs a plurality of operations as recited by the system of claim 3, and has limitations that are similar to those of claim 3, thus is rejected with the same rationale applied against claim 3.

Regarding Claim 17, 
This claim recites The computer-implemented method of claim 14, which performs a plurality of operations as recited by the system of claim 4, and has limitations that are similar to those of claim 4, thus is rejected with the same rationale applied against claim 4.
Regarding Claim 20, 
This claim recites The computer-implemented method of claim 14, which performs a plurality of operations as recited by the system of claim 7, and has limitations that are similar to those of claim 7, thus is rejected with the same rationale applied against claim 7.


Response to Arguments
Regarding Objections to the Drawings: 
Applicant’s argument: 
“Applicant respectfully traverses the objection to the drawings under 37 C.F.R. § 1.83(a). Applicant notes that, as outlined in MPEP 608.02 Drawing, the statutory requirement for showing the claimed invention only requires that the “applicant shall furnish a drawing where necessary for the 

Response: 
Applicant’s arguments have been fully considered and are persuasive. The objection to the Drawings has been withdrawn.

Regarding Double Patenting Rejections: 
Applicant’s argument: 
“The Action includes a non-statutory double patenting rejection of Claims 1 and 14 in view of Claims 1 and 15 of copending Application No. 16/748,452 (“the ’452 Appl.”) in view of U.S. Pat. Appl. Pub. No. 2018/0121964). The ’452 Appl. is currently pending and thus any double patenting rejection may only be provisional at this stage. See MPEP 804(1)(B). Applicant requests that the provisional non-statutory double patenting rejection be held in abeyance unless and until the 452 Appl. issues as a patent prior to issuance of the present application.”

Response: 
The double patenting rejection of claims 1 and 14 has been withdrawn due to amendments to claims 1 and 14. 

Regarding 35 USC 112(f) Claim Interpretation: 
Applicant’s argument: 
“The Action alleges that the terms “a computing device configured to,” “content elements configured for,” “a trained selection model configured to,” “the trained selection model is configured to,” and “the machine learning model is configured to” are subject to construction under 35 U.S.C. § 112(f) as means-plus-function claims. Although Applicant respectfully disagrees that the claims as previously presented were subject to interpretation under 35 U.S.C. § 112(f), Applicant has amended the claims herein to make clear that the claims are not subject to such interpretation. As presented herein, the claims do not meet the three-prong test for interpretation under § 112(f), as the claims do not use a “means,” “step,” or a generic placeholder, does not include modification of a generic term by functional language with a linking word or phrase, and the claims recite sufficient structure for performing any claimed functions directly in the claims.”

Response: 
Applicant’s arguments have been fully considered and are persuasive. The claim interpretation of claims 1, 4, 5, 7, 11, and 12 under 35 USC 112(f) has been withdrawn.

 Regarding 35 USC 112 Claim Rejections: 
Applicant’s argument: 
“Claims 1-7 and 11-13 stand rejected under 35 U.S.C. § 112(a) as allegedly failing to meet the written description requirement. Specifically, the Action alleges that certain claim elements invoke 35 

Response: 
Applicant’s arguments have been fully considered and are persuasive. Because the claim interpretation of claims 1, 4, 5, 7, 11, and 12 under 35 USC 112(f) has been withdrawn, the 35 USC 112(a) rejection applied to claims 1-7 and 11-13 and the 35 USC 112(b) rejection applied to claims 1-7 and 11-13 has also been withdrawn. 

Regarding 35 USC 102 Claim Rejections: 
Applicant’s argument: 
“Applicant respectfully disagrees that claims 1-3, 8, and 14-16 as originally filed are anticipated by Zhang. Nevertheless, and solely in the interest of advancing the case to allowance, claims 1 and 14 have been amended herein to include the subject matter of claims 5-6 and 18-19, respectively. Claims 5, 6, 18, and 19 were not rejected as anticipated by Zhang. Thus, claims 1 and 14 as presented herein are not anticipated by Zhang.”

Response: 
The 35 USC 102 Rejection of claims 1-3, 8, and 14-16 has been withdrawn because independent claims 1, 8, and 14 have been amended to include the subject matter of claims 5-6 (and analogous claims). Claims 1-4, 7-11, 14-17, and 20 are now rejected under 35 USC 103 because the combination of Zhang, Du, and Jiang teaches the limitations of claims 1-4, 7-11, 14-17, and 20, as necessitated by amendments. 

Regarding 35 USC 103 Claim Rejections: 
Applicant’s argument: 
“In the Office Action, claims 4, 5, 7, 9-12, 17, 18, and 20 stand rejected under 35 U.S.C. § 103 as allegedly being obvious over Zhang in view of U.S. Pat. Appl. Pub. No. 2020/0033144 to Du, et al. (hereinafter “Du”). Claims 4, 5, 7, and 9-11 depend from claim 1. Claims 17, and 20 depend from claim 14. As discussed above, claims 1 and 14 have been amended herein to include the subject matter of claims 5-6 and 18-19, respectively. Neither claim 6 nor claim 19 was rejected over the combination of Zhang and Du. Thus, as presented herein, claims 1 and 14 are patentable over the proposed combination of Zhang and Du. Claims 4, 5, 7, 9-11, 17, and 20 are patentable over the proposed combination of Zhang and Du by virtue of their respective dependencies and for the features recited therein.”

Response: 
Examiner respectfully disagrees. In the previous office action, claims 6 and 19 were rejected over the combination of Zhang, Du, and Jiang. Because the subject matter of claims 5-6 (and related analogous claims) have been incorporated into independent claims 1, 8, and 14, claims 1, 8, and 14 (and associated dependent claims) are rejected under 35 USC 103 because the combination of Zhang, Du, and Jiang teaches all the limitations of the independent claims. This rejection was necessitated by applicant’s amendment to independent claims 1, 8, and 14. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to the applicant’s disclosure: 
Broden et al. (“Ensemble Recommendations via Thompson Sampling: an Experimental Study within e-Commerce”) teaches using Thompson sampling as a recommendation algorithm to recommend items in an ecommerce platform. 

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHOUN ABRAHAM whose telephone number is (571)272-8144.  The examiner can normally be reached on Mon - Fri 08:00-16:30.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/S.J.A./Examiner, Art Unit 2125                

/BRIAN M SMITH/Primary Examiner, Art Unit 2122