DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office Action is in response to communications filed on January 21, 2020. 
Claims 1 – 20 are presented for examination and are pending. 
Oath/Declaration
For the record, the Examiner acknowledges that the Oath/Declaration filed on January 21, 2020, has been received. 
Information Disclosure Statement
The information disclosure statement (IDS) were submitted on 07/23/2020, 12/02/2020, 04/01/2021, and 04/26/21.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.
Drawings
The drawings are objected to under 37 CFR 1.83(a).  The drawings must show every feature of the invention specified in the claims.  Therefore, the trained selection model configured to use Thompson sampling, as recited in claim 1, the plurality of content elements selected based on a received persona, as recited in claim 2, the trained selection model trained using a plurality of prior impressions, as recited in claim 3, and the posterior distribution parameters calculated using a short-term reward value and long-term reward value, as recited in claim 6 must be shown or the feature(s) canceled from the claim(s).  No new matter should be entered.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is 

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1 and 14 are rejected on the ground of nonstatutory double patenting as being unpatentable over Claims 1 and 15 of copending Application No. 16/748,452 (reference application) in view of Zhang et al. (US 2018/0121964 A1). Underlined limitations in the table below indicate limitations not disclosed by the claims of the reference application. 
Instant Application 
Reference Application: 16/748,452
1. A system for content selection and presentation, comprising 

a computing device configured to:

receive a plurality of content elements configured for presentation in at least one
content container;

select one of the plurality of content elements for presentation in the at least one content container, wherein the one of the plurality of content elements is selected by a trained configured to use Thompson sampling; and

generate an interface including the selected one of the plurality of content elements.
1. A system for content selection and presentation, comprising:
a memory having instructions stored thereon, and a processor configured to read the instructions to:

receive a plurality of content elements configured for presentation in at least one content container;

select one of the plurality of content elements for presentation in the at least one content container, wherein the one of the plurality of content elements is selected by a trained selection model based on an optimal impression allocation, wherein the optimal impression allocation is configured to balance a short-term reward value and a long-term reward value of each of the plurality of content elements, wherein the short-term reward value indicates immediate rewards, and wherein the long-term reward value indicates a user return rate and is calculated as a sum of discounted short term rewards; and

generate an interface including the selected one of the plurality of content elements.



14. A computer-implemented method, comprising:

receiving a plurality of content elements configured for presentation in at least one
content container;

selecting one of the plurality of content elements for presentation in the at least one
content container, wherein the one of the plurality of content elements is selected by a trained selection model configured to use Thompson sampling; and

generating an interface including the selected one of the plurality of content elements.
15. A computer-implemented method, comprising:

receiving a plurality of content elements configured for presentation in at least one content container;

selecting one of the plurality of content elements for presentation in the at least one content container, wherein the one of the plurality of content elements is selected by a trained selection model based on an optimal impression allocation, wherein the optimal impression allocation is configured to balance a short-term reward value and a long-term reward value of each of the plurality of content elements, wherein the short-term reward value indicates immediate rewards, and wherein the long-term reward value indicates a user return rate and is calculated as a sum of discounted short term rewards; and

generating an interface including the selected one of the plurality of content elements.



Regarding Claim 1, 
Zhang et al. (US 2018/0121964 A1) teaches this limitation (Para [0009]: “The online system selects components to include in the optimal content item to be presented to the subject user based on an affinity score of the subject user predicted for each candidate component, in which an affinity score for a candidate component indicates the subject user's predicted affinity for the candidate component. For example, the online system predicts affinity scores of the subject user for candidate components and selects the candidate components that are associated with the highest affinity scores for inclusion in the optimal content item ( e.g., by ranking multiple candidate components of various types based on their affinity scores and selecting the highest ranked candidate component of each type).” teaches that content items are selected as the optimal content item based on an affinity score of the user with the respective content item (candidate component); Para [0013]: “In some embodiments, the affinity score of the subject user for a candidate component may be predicted using a machine-learned model. The online system may train the machine-learned model to predict an affinity score of the subject user for a candidate component using affinity scores of viewing users of the online system for the candidate component, in which the viewing users have at least a threshold measure of similarity to the subject user (e.g., based on attributes shared by the subject user and the viewing users). For example, the online system trains the machine-learned model using a set of affinity scores of viewing users of the online system for each candidate component included in "training content items" presented to the viewing users and information describing the ages and genders of the viewing users.” teaches using a trained machine learning model to predict affinity scores of content items to be potentially used for selection and presentation to the user; Para [0017]: “In addition to random selection, the historical performance information also may be associated with training content items generated from candidate components that are selected using a heuristic (e.g., Thompson sampling). For example, once the training content items that include randomly selected candidate components have achieved at least 1,000 impressions, the online system generates training content items that include candidate components that are selected using Thompson sampling. The online system may use Thompson sampling to select each candidate component to include in a training content item based on a distribution of affinity scores for each candidate component, in which the distribution of affinity scores for a candidate component is inversely proportional to the amount of data for the component (i.e., the number of impressions achieved by training content items including the component).” teaches that the machine learning model can be configured to use Thompson sampling for selecting content items). One of ordinary skill in the arts would have been motivated to make this modification in order to “to maximize the likelihood that the subject user will click on the optimal advertisement when it is presented to the subject user.” (Zhang, Para [0110]).
Claim 1 of the instant application differs from Claim 1 of the reference application in that claim 1 (instant) recites “a computing device configured to” whereas claim 1 (reference) recites “a memory having instructions stored thereon, and a processor configured to read the instructions to”. It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to implement the memory and processor of claim 1 of the reference application as a computing device. 

Regarding Claim 14, 
The reference application, in claim 15, does not teach “configured to use Thompson sampling”, however Zhang et al. (US 2018/0121964 A1) teaches this limitation (Para [0009]: “The online system selects components to include in the optimal content item to be presented to the subject user based on an affinity score of the subject user predicted for each candidate component, in which an affinity score for a candidate component indicates the subject user's predicted affinity for the candidate component. For example, the online system predicts affinity scores of the subject user for candidate components and selects the candidate components that are associated with the highest affinity scores for inclusion in the optimal content item ( e.g., by ranking multiple candidate components of various types based on their affinity scores and selecting the highest ranked candidate component of each type).” teaches that content items are selected as the optimal content item based on an affinity score of the user with the respective content item (candidate component); Para [0013]: “In some embodiments, the affinity score of the subject user for a candidate component may be predicted using a machine-learned model. The online system may train the machine-learned model to predict an affinity score of the subject user for a candidate component using affinity scores of viewing users of the online system for the candidate component, in which the viewing users have at least a threshold measure of similarity to the subject user (e.g., based on attributes shared by the subject user and the viewing users). For example, the online system trains the machine-learned model using a set of affinity scores of viewing users of the online system for each candidate component included in "training content items" presented to the viewing users and information describing the ages and genders of the viewing users.” teaches using a trained machine learning model to predict affinity scores of content items to be potentially used for selection and presentation to the user; Para [0017]: “In addition to random selection, the historical performance information also may be associated with training content items generated from candidate components that are selected using a heuristic (e.g., Thompson sampling). For example, once the training content items that include randomly selected candidate components have achieved at least 1,000 impressions, the online system generates training content items that include candidate components that are selected using Thompson sampling. The online system may use Thompson sampling to select each candidate component to include in a training content item based on a distribution of affinity scores for each candidate component, in which the distribution of affinity scores for a candidate component is inversely proportional to the amount of data for the component (i.e., the number of impressions achieved by training content items including the component).” teaches that the machine learning model can be configured to use Thompson sampling for selecting content items). One of ordinary skill in the arts would have been motivated to make this modification in order to “to maximize the likelihood that the subject user will click on the optimal advertisement when it is presented to the subject user.” (Zhang, Para [0110]).

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 

(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not 

Claim 1:
a computing device configured to…
content elements configured for presentation in at least one content container
a trained selection model configured to use Thompson sampling

Claim 4: 
wherein the trained selection model is configured to implement a state-action-reward-state-action (SARSA) process modified to use Thompson sampling.

Claim 5: 
wherein the trained selection model is configured to calculate one or more posterior distribution parameters of a total reward value Q

Claim 7: 
wherein the computing device is configured to…

Claim 11: 
wherein the machine learning model is configured to implement a state-action-reward-state-action (SARSA) process modified to use Thompson sampling.

Claim 12: 
the trained selection model is configured to calculate one or more posterior distribution parameters of a total reward value Q

Present application’s disclosure provides the following description regarding the above generic modifiers: 
Specification [0014]: 
FIG. 1 illustrates a computer system configured to implement one or more processes, in accordance with some embodiments. The system 2 is a representative device and may comprise a processor subsystem 4, an input/output subsystem 6, a memory subsystem 8, a communications interface 10, and a system bus 12. In some embodiments, one or more than one of the system 2 components may be combined or omitted such as, for example, not including an input/output subsystem 6. In some embodiments, the system 2 may comprise other components not combined or comprised in those shown in FIG. 1. For example, the system 2 may also include, for example, a power subsystem. In other embodiments, the system 2 may include several instances of the components shown in FIG. 1. For example, the system 2 may include multiple memory subsystems 8. For the sake of conciseness and clarity, and not limitation, one of each of the components is shown in FIG. 1.
Specification [0047]: 
The trained content selection model 160 selects a presentation content element 210 from among the potential content elements 208a-208e and presents the selected presentation content element 210 to the user in the first content container 206a of the e-commerce interface 204. After receiving the e-commerce interface 204, a user may perform one or more actions. In some embodiments, a set of presentation content elements 210 are preselected for users having a first persona such that the e-commerce interface 204 with the selected presentation content elements 210 may be cached and provided to a user without delay. The trained content selection model 160 may be 
Specification [0006]: 
In various embodiments, a computer-implemented method is disclosed. The method includes steps of receiving a plurality of content elements configured for presentation in at least one content container and selecting one of the plurality of content elements for presentation in the at least one content container. The one of the plurality of content elements is selected by a trained selection model configured to use Thompson sampling. An interface including the selected one of the plurality of content elements is generated.
Specification [0035]: 
In some embodiments, the content selection system 26 receives a trained content selection model from a model training system 28. As discussed below, the model training system 28 is configured to implement a machine learning process using a reinforcement learning mechanism, such as, for example, an explore/exploit mechanism. In some embodiments, the model training system 28 is configured to iteratively modify one or more machine learning (e.g., artificial intelligence, neural network, etc.) models based on additional training data, modified rewards values, and/or other data received from additional systems, such as the network interface system 24 and/or the content selection system 26. In some embodiments, the model training system 28 implements a state-action-reward-state-action (SARSA) process modified to use Thomspon sampling, as discussed in greater detail below.
Specification [0039]: 
In some embodiments, a posterior distribution technique, such as a state-action-reward-state-action (SARSA) algorithm modified to use Thompson sampling, is applied to the set of training data 152. At step 104, the value of Q(•,•) is initialized as a normal distribution. The initial parameters of the normalized distribution may be set arbitrarily, based on empirical estimates, and/or based on prior 


Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person 

Claims 1 – 7 and 11 – 13 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
The following claim limitations invoke 35 U.S.C. 112(f):
Claim 1:
a computing device configured to…
content elements configured for presentation in at least one content container
a trained selection model configured to use Thompson sampling

Claim 4: 
wherein the trained selection model is configured to implement a state-action-reward-state-action (SARSA) process modified to use Thompson sampling.

Claim 5: 
wherein the trained selection model is configured to calculate one or more posterior distribution parameters of a total reward value Q

Claim 7: 
wherein the computing device is configured to…

Claim 11: 
wherein the machine learning model is configured to implement a state-action-reward-state-action (SARSA) process modified to use Thompson sampling.

Claim 12: 
wherein the trained selection model is configured to calculate one or more posterior distribution parameters of a total reward value Q

However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function.
The disclosure of the present Office Action does not provide sufficient description of the corresponding structure for performing the entire claim function associated with the generic modifier.
For claim 1, the Specification [0014] provides a general description of a computing device, Specification [0047] provides a general description of presenting content elements, and Specification [0006] provides a general description of a trained selection model that uses Thompson sampling. These descriptions are insufficient because they do not describe the specific algorithm the computing device uses, the specific algorithm the content elements use for presentation, and the specific algorithm that the trained selection model uses for Thompson sampling. 
For claim 4, the Specification [0035] only provides a general description of a trained selection model that is configured to use a SARSA process modified to use Thompson sampling. This description is insufficient because it does not describe the specific algorithm that the trained selection model uses for a SARSA process that is modified to use Thompson sampling. 

For claim 7, the Specification [0014] provides a general description of a computing device. This description is insufficient because it do not describe the specific algorithm the computing device uses.
For claim 11, the Specification [0035] only provides a general description of a trained selection model that is configured to use a SARSA process modified to use Thompson sampling. This description is insufficient because it does not describe the specific algorithm that the trained selection model uses for a SARSA process that is modified to use Thompson sampling. 
For claim 12, the Specification [0039] only provides a general description of a trained selection model that is configured to calculate posterior distribution parameters. This description is insufficient because it does not describe the specific algorithm that the trained selection model uses to calculate the posterior distribution parameters.
Therefore, claims 1, 4, 5, 7, 11, and 12 are rejected under 35 U.S.C. 112(a) for lack of written description. See MPEP 2181, subsection II ("When a claim containing a computer- implemented 35 U.S.C. 112(f) claim limitation is found to be indefinite under 35 U.S.C. 112(b) for failure to disclose sufficient corresponding structure (e.g., the computer and the algorithm) in the specification that performs the entire claimed function, it will also lack written description under 35 U.S.C. 112(a). See MPEP § 2163.03, subsection VI.").
Each dependent claim of the above claims is rejected for the same rationale as the claim from which it depends.



The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1 – 7 and 11 – 13 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The following claim limitations invoke 35 U.S.C. 112(f):
Claim 1:
a computing device configured to…
content elements configured for presentation in at least one content container
a trained selection model configured to use Thompson sampling

Claim 4: 
wherein the trained selection model is configured to implement a state-action-reward-state-action (SARSA) process modified to use Thompson sampling.

Claim 5: 
wherein the trained selection model is configured to calculate one or more posterior distribution parameters of a total reward value Q

Claim 7: 
wherein the computing device is configured to…

Claim 11: 
wherein the machine learning model is configured to implement a state-action-reward-state-action (SARSA) process modified to use Thompson sampling.

Claim 12: 
wherein the trained selection model is configured to calculate one or more posterior distribution parameters of a total reward value Q

However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function.
The disclosure of the present Office Action does not provide sufficient description of the corresponding structure for performing the entire claim function associated with the generic modifier.
For claim 1, the Specification [0014] provides a general description of a computing device, Specification [0047] provides a general description of presenting content elements, and Specification [0006] provides a general description of a trained selection model that uses Thompson sampling. These descriptions are insufficient because they do not describe the specific algorithm the computing device uses, the specific algorithm the content elements use for presentation, and the specific algorithm that the trained selection model uses for Thompson sampling. 
For claim 4, the Specification [0035] only provides a general description of a trained selection model that is configured to use a SARSA process modified to use Thompson sampling. This description is 
For claim 5, the Specification [0039] only provides a general description of a trained selection model that is configured to calculate posterior distribution parameters. This description is insufficient because it does not describe the specific algorithm that the trained selection model uses to calculate the posterior distribution parameters.
For claim 7, the Specification [0014] provides a general description of a computing device. This description is insufficient because it do not describe the specific algorithm the computing device uses.
For claim 11, the Specification [0035] only provides a general description of a trained selection model that is configured to use a SARSA process modified to use Thompson sampling. This description is insufficient because it does not describe the specific algorithm that the trained selection model uses for a SARSA process that is modified to use Thompson sampling. 
For claim 12, the Specification [0039] only provides a general description of a trained selection model that is configured to calculate posterior distribution parameters. This description is insufficient because it does not describe the specific algorithm that the trained selection model uses to calculate the posterior distribution parameters.
Therefore, claims 1, 4, 5, 7, 11, and 12 are indefinite and are rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. 
For purposes of examination: 
“a computing device configured to…” is interpreted as any computing device 
“content elements configured for presentation in at least one content container” is interpreted as any content that can be presented
“a trained selection model configured to use Thompson sampling” is interpreted as any selection model that can use Thompson sampling 
the trained selection model is configured to implement a state-action-reward-state-action (SARSA) process modified to use Thompson sampling.” is interpreted as any selection model that implements a state-action-reward-state-action process modified to use Thompson sampling 
“wherein the trained selection model is configured to calculate one or more posterior distribution parameters of a total reward value Q” is interpreted as any selection model that calculates posterior distribution parameters of a total reward. 

Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 


Regarding Claim 12,
Claim 12 recites “wherein the trained selection model…”. There is insufficient antecedent basis for this limitation in the claim. A recommended amendment is “wherein the  machine learning model…”
Dependent claims 2 – 7, 12, and 13 are rejected due to being directly and indirectly dependent on rejected claims. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1 – 3, 8, and 14 – 16 are rejected under 35 U.S.C. 102(a)(1) as being anticipated over Zhang et al. (US 2018/0121964 A1)
Regarding Claim 1, 
Zhang teaches: 
A system for content selection and presentation, comprising a computing device configured to: (Para [0006]: “An online system receives multiple candidate content item components ("candidate components") of at least one type (e.g., title, image, body text, call to action, video, etc.) from a content-providing user of the online system (e.g., an advertiser) for including in a content item to be presented to viewing users of the online system.” teaches a system for content selection and presentation; Para [0118]: “Embodiments also may relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, and/or it may comprise a general-purpose computing device selectively activated or reconfigured by a computer program stored in the computer.” teaches a computing device)
receive a plurality of content elements configured for presentation in at least one content container; (Para [0008]: “The optimal content item is included in a content selection process (e.g., an auction) that selects one or more content items for presentation to the subject user. For example, the online system includes an optimal advertisement in an advertisement auction that ranks the optimal advertisement among one or more additional advertisements based on a bid amount associated with each advertisement and selects a highest ranked advertisement for presentation to the subject user.” teaches receiving a plurality of content elements that can be selected for presentation as an optimal content item (content container); Fig. 5 and Para [0007]: “Upon identifying an opportunity to present a content item to a subject user of the online system (i.e., an "impression" opportunity), the online system dynamically generates an optimal content item ( e.g., an optimal advertisement) for presentation to the subject user using one or more of the candidate components.” teaches that the optimal content item is a content container because it contains the selected content items)
select one of the plurality of content elements for presentation in the at least one content container, (Fig. 5 and Para [0008]: “The optimal content item is included in a content selection process (e.g., an auction) that selects one or more content items for presentation to the subject user.” teaches selecting one of a plurality of content items to be used in an optimal content item (content container) which is presented to the user) 
wherein the one of the plurality of content elements is selected by a trained selection model configured to use Thompson sampling; and (Para [0009]: “The online system selects components to include in the optimal content item to be presented to the subject user based on an affinity score of the subject user predicted for each candidate component, in which an affinity score for a candidate component indicates the subject user's predicted affinity for the candidate component. For example, the online system predicts affinity scores of the subject user for candidate components and selects the candidate components that are associated with the highest affinity scores for inclusion in the optimal content item ( e.g., by ranking multiple candidate components of various types based on their affinity scores and selecting the highest ranked candidate component of each type).” teaches that content items are selected as the optimal content item based on an affinity score of the user with the respective content item (candidate component); Para [0013]: “In some embodiments, the affinity score of the subject user for a candidate component may be predicted using a machine-learned model. The online system may train the machine-learned model to predict an affinity score of the subject user for a candidate component using affinity scores of viewing users of the online system for the candidate component, in which the viewing users have at least a threshold measure of similarity to the subject user (e.g., based on attributes shared by the subject user and the viewing users). For example, the online system trains the machine-learned model using a set of affinity scores of viewing users of the online system for each candidate component included in "training content items" presented to the viewing users and information describing the ages and genders of the viewing users.” teaches using a trained machine learning model to predict affinity scores of content items to be potentially used for selection and presentation to the user; Para [0017]: “In addition to random selection, the historical performance information also may be associated with training content items generated from candidate components that are selected using a heuristic (e.g., Thompson sampling). For example, once the training content items that include randomly selected candidate components have achieved at least 1,000 impressions, the online system generates training content items that include candidate components that are selected using Thompson sampling. The online system may use Thompson sampling to select each candidate component to include in a training content item based on a distribution of affinity scores for each candidate component, in which the distribution of affinity scores for a candidate component is inversely proportional to the amount of data for the component (i.e., the number of impressions achieved by training content items including the component).” teaches that the machine learning model can be configured to use Thompson sampling for selecting content items)
generate an interface including the selected one of the plurality of content elements. (Para [0008]: “The online system may then present the selected content item to the subject user (e.g., in a display area of a client device associated with the subject user).” teaches presenting the selected content item to a user through a display (generating an interface on the display for the user to view the selected content item))

Regarding Claim 2, 
Zhang teaches The system of claim 1,
Zhang further teaches: 
wherein the plurality of content elements are selected based on a received persona. (Para [0069]: “The content selection module 255 selects (e.g., as shown in step 365 of FIG. 3) one or more content items for communication to a client device 110 to be presented to a user. Content items eligible for presentation to the user are retrieved from the content store 210, from the ad request store 230, or from another source by the content selection module 255, which selects one or more of the content items for presentation to the user. A content item eligible for presentation to the user is a content item associated with at least a threshold number of targeting criteria satisfied by characteristics of the user or is a content item that is not associated with targeting criteria.” teaches selecting potential content items to present to the user based on targeting criteria; Para [0004]: “For example, targeting criteria are used to identify users associated with specific user profile information satisfying at least one of the targeting criteria. Attributes specified by targeting criteria are usually associated with online system users who are likely to have an interest in content items associated with the targeting criteria or who are likely to find such content items relevant. For example, content items associated with the board game chess may be associated with targeting criteria describing online system users who have expressed an interest in board games (e.g., users who have included playing board games as a hobby in their profile information, users who have downloaded game applications for board games in the online system, etc).” teaches that the targeting criteria is based on a user’s characteristics and actions (persona))

Regarding Claim 3, 
Zhang teaches The system of claim 1,
Zhang further teaches:
wherein the trained selection model is trained using a plurality of prior impressions. (Para [0016]: “In some embodiments, the historical performance information used to train the machine-learned model is associated with training content items generated from randomly selected candidate components, in which the training content items have achieved at least a threshold number of impressions (e.g., 1,000 impressions). For example, if the content-providing user provides 13 different candidate image components to the online system, the online system randomly selects one of the candidate image components to include in a training content item that is presented to a viewing user of the online system and repeats this process until at least a threshold number of impressions have been achieved for each candidate image component. In this example, performance information associated with each impression of the training content items is used to train the machine-learned model.” teaches selecting a candidate component for training the model based on the plurality of prior impressions and whether or not the training content item generates a threshold number of impressions)

Regarding Claim 8, 
Zhang teaches:
A non-transitory computer readable medium having instructions stored thereon, wherein the instructions, when executed by a processor cause a device to perform operations comprising: (Para [0117]: “In one embodiment, a software module is implemented with a computer program product comprising a computer-readable medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.” teaches a computer readable medium containing instructions that can be executed by the processor)
receiving a request for an interface, wherein the request includes a user persona; (Para [0007]: “Upon identifying an opportunity to present a content item to a subject user of the online system (i.e., an "impression" opportunity), the online system dynamically generates an optimal content item (e.g., an optimal advertisement) for presentation to the subject user using one or more of the candidate components.” teaches receiving an opportunity to present content to a user (request for an interface) and generating an optimal content item to present to the user; Para [0069]: “The content selection module 255 selects (e.g., as shown in step 365 of FIG. 3) one or more content items for communication to a client device 110 to be presented to a user. Content items eligible for presentation to the user are retrieved from the content store 210, from the ad request store 230, or from another source by the content selection module 255, which selects one or more of the content items for presentation to the user. A content item eligible for presentation to the user is a content item associated with at least a threshold number of targeting criteria satisfied by characteristics of the user or is a content item that is not associated with targeting criteria.” teaches presenting content items to the user based on targeting criteria; Para [0004]: “For example, targeting criteria are used to identify users associated with specific user profile information satisfying at least one of the targeting criteria. Attributes specified by targeting criteria are usually associated with online system users who are likely to have an interest in content items associated with the targeting criteria or who are likely to find such content items relevant. For example, content items associated with the board game chess may be associated with targeting criteria describing online system users who have expressed an interest in board games (e.g., users who have included playing board games as a hobby in their profile information, users who have downloaded game applications for board games in the online system, etc).” teaches that the targeting criteria is based on a user’s characteristics and actions (persona))
selecting at least one of a plurality of content elements for inclusion in the interface, wherein the at least one of the plurality of content elements is selected using Thompson sampling; and (Fig. 5 and Para [0008]: “The optimal content item is included in a content selection process (e.g., an auction) that selects one or more content items for presentation to the subject user.” teaches selecting one of a plurality of content items to be used in an optimal content item (content container) which is presented to the user; Para [0017]: “In addition to random selection, the historical performance information also may be associated with training content items generated from candidate components that are selected using a heuristic (e.g., Thompson sampling). For example, once the training content items that include randomly selected candidate components have achieved at least 1,000 impressions, the online system generates training content items that include candidate components that are selected using Thompson sampling. The online system may use Thompson sampling to select each candidate component to include in a training content item based on a distribution of affinity scores for each candidate component, in which the distribution of affinity scores for a candidate component is inversely proportional to the amount of data for the component (i.e., the number of impressions achieved by training content items including the component).” teaches using Thompson sampling to select content elements)
generating an interface including the selected at least one of the plurality of content elements. (Para [0008]: “The online system may then present the selected content item to the subject user (e.g., in a display area of a client device associated with the subject user).” teaches presenting the selected content item to a user through a display (generating an interface on the display for the user to view the selected content item))

Regarding Claim 14, 
This claim recites A computer-implemented method, which performs a plurality of operations as recited by the system of claim 1, and has limitations that are similar to those of claim 1, thus is rejected with the same rationale applied against claim 1. 
Regarding Claim 15, 
This claim recites The computer-implemented method of claim 14, which performs a plurality of operations as recited by the system of claim 2, and has limitations that are similar to those of claim 2, thus is rejected with the same rationale applied against claim 2. 
Regarding Claim 16, 
This claim recites The computer-implemented method of claim 14, which performs a plurality of operations as recited by the system of claim 3, and has limitations that are similar to those of claim 3, thus is rejected with the same rationale applied against claim 3.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 4, 5, 7, 9 – 12, 17, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Du et al. (US 2020/0033144 A1)
Regarding Claim 4, 
Zhang teaches The system of claim 1,
Zhang does not appear to explicitly teach:
wherein the trained selection model is configured to implement a state-action-reward-state-action (SARSA) process modified to use Thompson sampling.
However, Du teaches: 
wherein the trained selection model is configured to implement a state-action-reward-state-action (SARSA) process modified to use Thompson sampling. (Para [0061 and 0062]: “After constructing the Markov Decision Process models 310, the event sequence recommender system 106 uses Thompson Sampling, which chooses actions in real time to maximize the expected experience as calculated by the reward on each state. In particular, through Thompson Sampling, the event sequence recommender system 106 recommends actions based on their probability of maximizing the expected reward as shown below:

    PNG
    media_image1.png
    58
    686
    media_image1.png
    Greyscale
In the equation above, X represents the current context and 'D ={ (X; a; r)} represents past observations of contexts, actions, and rewards.” teaches a selection model that is configured to implement a state-action-reward-state-action process modified to use Thompson sampling)
Zhang and Du are analogous art because they are directed to recommendation systems that use Thompson Sampling. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Du’s dynamic user preference interface into Zhang’s system for presenting optimal content items with a motivation to “…use the previously trained recommendation model in conjunction with the modified reward function to quickly generate modified recommendations” (Du, Para [0032]). 

Regarding Claim 5, 
Zhang teaches The system of claim 1,
Zhang does not appear to explicitly teach:
wherein the trained selection model is configured to calculate one or more posterior distribution parameters of a total reward value Q, and wherein the Thompson sampling is applied to the one or more posterior distribution parameters.
However, Du teaches: 
wherein the trained selection model is configured to calculate one or more posterior distribution parameters of a total reward value Q, and wherein the Thompson sampling is applied to the Para [0062]: “The event sequence recommender system 106 implements Thompson Sampling by sampling, in each round, a parameter θ* from the posterior P(θ|D), and choosing the action a* that maximizes IE [rlX a*, θ*] (i.e., the expected reward given the parameter, the action, and the current context).” teaches calculating and sampling a parameter θ*, from the posterior distribution P(θ|D) by using Thompson Sampling, that the parameter is used to calculate an expected reward for choosing an action, and maximizing the calculated expected reward by choosing an appropriate action given the parameter and current context)
Zhang and Du are analogous art because they are directed to recommendation systems that use Thompson Sampling. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Du’s dynamic user preference interface into Zhang’s system for presenting optimal content items with a motivation to “…use the previously trained recommendation model in conjunction with the modified reward function to quickly generate modified recommendations” (Du, Para [0032]).

Regarding Claim 7, 
Zhang teaches The system of claim 1,
Zhang does not appear to explicitly teach:
wherein the computing device is configured to: identify a state and an action taken through the interface including the selected one of the plurality of content elements; 
receive an updated trained selection model having a reward function updated based on the state and the action.
However, Du teaches: 
Para [0060]: “Using policy iteration, the event sequence recommender system 106 can determine the optimal policies and value function Vθ *(x) corresponding to each of the Markov Decision Process models 310. In particular, a policy includes a function that specifies the action a user will take when in a particular state of the model.” teaches identifying a state and an action of the model; Para [0104]: “Subsequently, the event sequence recommender system 106 can provide, for display via a client device, a user interface that displays the recommended sequence of digital content transmissions, the plurality of historical digital content transmissions, and a plurality of interactive elements for entry of user preferences. In one or more embodiments, a user preference can include a distribution channel through which to transmit the digital content (e.g., email, multimedia messaging, social media post, etc.), a preferred digital content category ( e.g., video advertisement, digital image, informative literature etc.), or a preferred digital content item to transmit ( e.g., a particular advertisement or piece of informative literature).” teaches that an action of the event sequence recommender can include providing display of preferred (selected) content elements through a user interface)
receive an updated trained selection model having a reward function updated based on the state and the action. (Para [0061] and [0062]: “After constructing the Markov Decision Process models 310, the event sequence recommender system 106 uses Thompson Sampling, which chooses actions in real time to maximize the expected experience as calculated by the reward on each state. In particular, through Thompson Sampling, the event sequence recommender system 106 recommends actions based on their probability of maximizing the expected reward as shown below: 

    PNG
    media_image1.png
    58
    686
    media_image1.png
    Greyscale

…The event sequence recommender system 106 implements Thompson Sampling by sampling, in each round, a parameter θ* from the posterior P(θ|D), and choosing the action a* that maximizes IE [rlX a*, θ*] (i.e., the expected reward given the parameter, the action, and the current context).” 
teaches that the event sequence recommender uses Thompson sampling with a reward function that updates the rewards based on a chosen action that maximizes the expected reward, given the action and current context (state))
Zhang and Du are analogous art because they are directed to recommendation systems that use Thompson Sampling. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Du’s dynamic user preference interface into Zhang’s system for presenting optimal content items with a motivation to “…use the previously trained recommendation model in conjunction with the modified reward function to quickly generate modified recommendations” (Du, Para [0032]).

Regarding Claim 9, 
Zhang teaches The non-transitory computer readable medium of claim 8,
Zhang does not appear to explicitly teach:
wherein the Thompson sampling is implemented by a machine learning model trained using reinforcement learning.
However, Du teaches: 
wherein the Thompson sampling is implemented by a machine learning model trained using reinforcement learning. (Para [0061]: “After constructing the Markov Decision Process models 310, the event sequence recommender system 106 uses Thompson Sampling, which chooses actions in real time to maximize the expected experience as calculated by the reward on each state.” teaches that 
Zhang and Du are analogous art because they are directed to recommendation systems that use Thompson Sampling. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Du’s dynamic user preference interface into Zhang’s system for presenting optimal content items with a motivation to “…use the previously trained recommendation model in conjunction with the modified reward function to quickly generate modified recommendations” (Du, Para [0032]).

Regarding Claim 10, 
The combination of Zhang and Du teaches The non-transitory computer readable medium of claim 9,
Zhang further teaches:
wherein the machine learning model is trained using a plurality of prior impressions. (Para [0016]: “In some embodiments, the historical performance information used to train the machine-learned model is associated with training content items generated from randomly selected candidate components, in which the training content items have achieved at least a threshold number of impressions (e.g., 1,000 impressions). For example, if the content-providing user provides 13 different candidate image components to the online system, the online system randomly selects one of the candidate image components to include in a training content item that is presented to a viewing user of the online system and repeats this process until at least a threshold number of impressions have been achieved for each candidate image component. In this example, performance information associated with each impression of the training content items is used to train the machine-learned model.” teaches selecting a candidate component for training the model based on the plurality of prior impressions and whether or not the training content item generates a threshold number of impressions)

Regarding Claim 11, 
The combination of Zhang and Du teaches The non-transitory computer readable medium of claim 9,
Du further teaches: 
wherein the machine learning model is configured to implement a state-action-reward-state-action (SARSA) process modified to use Thompson sampling. (Para [0061 and 0062]: “After constructing the Markov Decision Process models 310, the event sequence recommender system 106 uses Thompson Sampling, which chooses actions in real time to maximize the expected experience as calculated by the reward on each state. In particular, through Thompson Sampling, the event sequence recommender system 106 recommends actions based on their probability of maximizing the expected reward as shown below:

    PNG
    media_image1.png
    58
    686
    media_image1.png
    Greyscale
In the equation above, X represents the current context and 'D ={ (X; a; r)} represents past observations of contexts, actions, and rewards.” teaches a selection model that is configured to implement a state-action-reward-state-action process modified to use Thompson sampling)
Zhang and Du are analogous art because they are directed to recommendation systems that use Thompson Sampling. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Du’s dynamic user preference interface into Zhang’s system for presenting optimal content items with a motivation to “…use the previously 

Regarding Claim 12, 
The combination of Zhang and Du teaches The non-transitory computer readable medium of claim 9,
Du further teaches: 
wherein the trained selection model is configured to calculate one or more posterior distribution parameters of a total reward value Q, and wherein the Thompson sampling is applied to the one or more posterior distribution parameters. (Para [0062]: “The event sequence recommender system 106 implements Thompson Sampling by sampling, in each round, a parameter θ* from the posterior P(θ|D), and choosing the action a* that maximizes IE [rlX a*, θ*] (i.e., the expected reward given the parameter, the action, and the current context).” teaches calculating and sampling a parameter θ*, from the posterior distribution P(θ|D) by using Thompson Sampling, that the parameter is used to calculate an expected reward for choosing an action, and maximizing the calculated expected reward by choosing an appropriate action given the parameter and current context)
Zhang and Du are analogous art because they are directed to recommendation systems that use Thompson Sampling. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Du’s dynamic user preference interface into Zhang’s system for presenting optimal content items with a motivation to “…use the previously trained recommendation model in conjunction with the modified reward function to quickly generate modified recommendations” (Du, Para [0032]).

Regarding Claim 17, 

Regarding Claim 18, 
This claim recites The computer-implemented method of claim 14, which performs a plurality of operations as recited by the system of claim 5, and has limitations that are similar to those of claim 5, thus is rejected with the same rationale applied against claim 5.
Regarding Claim 20, 
This claim recites The computer-implemented method of claim 14, which performs a plurality of operations as recited by the system of claim 7, and has limitations that are similar to those of claim 7, thus is rejected with the same rationale applied against claim 7.

Claims 6, 13, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Du et al., further in view of Jiang et al. (US 2018/0174038 A1). 

Regarding Claim 6, 
The combination of Zhang and Du teaches The system of claim 5,
The combination of Zhang and Du does not appear to explicitly teach: 
wherein the one or more posterior distribution parameters are calculated using a short-term reward value, r, and a long term reward value, R.
However, Jiang teaches: 
wherein the one or more posterior distribution parameters are calculated using a short-term reward value, r, and a long term reward value, R. (Para [0068]: “Target yj is a computed target Q-value after taking an optimal action at time stamp j. It is computed as the current reward plus an estimated optimal Q-value after observing the new sensing frame Xj+1 determined by the current Q-network Nk-I with parameters θk. The parameter n is the forgetting factor valued between 0 and 1 and determines how important the system weights long-term rewards against short-term ones. The smaller the forgetting factor, the robotic device weights less on long-term rewards but cares only for the short-term rewards. If the forgetting factor is closer to 1, the robotic device tends to treat long-term rewards similarly with the short-term rewards.” teaches that θk, the posterior distribution parameter, is calculated based on the forgetting factor, a parameter that is calculated based on the short and long-term rewards. θk is a posterior distribution parameter because the parameter is calculated after observing the new sensing frame (parameter is assigned after the sensing frame (event) has occurred))
Zhang, Du, and Jiang are analogous art because they are directed to systems that use machine learning models. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Jiang’s short-term and long-term rewards into Zhang’s system for presenting optimal content items as modified by Du with a motivation to “…finds an optimal policy, in the sense that the expected value of the total reward return over all successive steps, starting from the current state, is the maximum achievable” (Jiang, Para [0039]).

Regarding Claim 13, 
The combination of Zhang and Du teaches The non-transitory computer readable medium of claim 12,
The combination of Zhang and Du does not appear to explicitly teach: 
wherein the one or more posterior distribution parameters are calculated using a short-term reward value, r, and a long term reward value, R.
However, Jiang teaches: 
Para [0068]: “Target yj is a computed target Q-value after taking an optimal action at time stamp j. It is computed as the current reward plus an estimated optimal Q-value after observing the new sensing frame Xj+1 determined by the current Q-network Nk-I with parameters θk. The parameter n is the forgetting factor valued between 0 and 1 and determines how important the system weights long-term rewards against short-term ones. The smaller the forgetting factor, the robotic device weights less on long-term rewards but cares only for the short-term rewards. If the forgetting factor is closer to 1, the robotic device tends to treat long-term rewards similarly with the short-term rewards.” teaches that θk, the posterior distribution parameter, is calculated based on the forgetting factor, a parameter that is calculated based on the short and long-term rewards. θk is a posterior distribution parameter because the parameter is calculated after observing the new sensing frame (parameter is assigned after the sensing frame (event) has occurred))
Zhang, Du, and Jiang are analogous art because they are directed to systems that use machine learning models. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Jiang’s short-term and long-term rewards into Zhang’s system for presenting optimal content items as modified by Du with a motivation to “…finds an optimal policy, in the sense that the expected value of the total reward return over all successive steps, starting from the current state, is the maximum achievable” (Jiang, Para [0039]).

Regarding Claim 19, 
This claim recites The computer-implemented method of claim 18, which performs a plurality of operations as recited by the system of claim 6, and has limitations that are similar to those of claim 6, thus is rejected with the same rationale applied against claim 6.

Conclusion
The prior art made of record and not relied upon is considered pertinent to the applicant’s disclosure: 
Lugt et al. (US 20210142118 A1) discloses a system that selects and presents content to a user by using Thompson Sampling and reinforcement learning. 
Kulkarni et al. (US 20200342500 A1) discloses a system that uses Thompson sampling to maximize the expected reward of a recommendation algorithm.
Zapella et al. (US 10242381 B1) discloses a system that selects optimal content to present to a user with Thompson sampling.  

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHOUN ABRAHAM whose telephone number is (571)272-8144.  The examiner can normally be reached on Mon - Fri 08:00-16:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-






/S.J.A./Examiner, Art Unit 2125                                                                                                                                                                                                        
/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125