DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application was filed on January 21, 2020. 
This office action is in response to Amendments and/or remarks filed on June 2, 2022. Claims 1, 4, and 18 have been amended. Claims 5-6, 12-13, and 18-19 were previously cancelled. Claims 21-26 are new. Claims 1-4, 7-11, 14-17, and 20-26 are pending. 

Duplicate Claims
Applicant is advised that should claim 9 be found allowable, claim 23 will be objected to under 37 CFR 1.75 as being a substantial duplicate thereof. When two claims in an application are duplicates or else are so close in content that they both cover the same thing, despite a slight difference in wording, it is proper after allowing one claim to object to the other as being a substantial duplicate of the allowed claim. See MPEP § 608.01(m).

Drawings
The drawings filed on June 2, 2022 are accepted. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 24 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding Claim 24, 
Claim 24 recites “The computer-implemented method of claim 23”. There is insufficient antecedent basis for this limitation in the claim. For purposes of examination, this limitation will be interpreted as The non-transitory computer readable medium of claim 23. A recommended amendment is “The non-transitory computer readable medium of claim 23”. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-4, 7-11, 14-17, and 20-26 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Regarding Claim 1, 
Claim 1 is directed to A system for generating item recommendations, comprising: a memory having instructions stored thereon, and a processor that reads the instructions to: , which is directed to a machine, one of the statutory categories.

Claim 1 recites the following limitations: 
select one of the plurality of content elements for presentation in the at least one content container,
This/These limitation(s) falls within the mental process grouping of abstract ideas that can be performed in the human mind, or by a human with pencil and paper. Claim 1 recites additional limitations: 
wherein the one of the plurality of content elements is selected by… using Thompson sampling,
…calculates one or more posterior distribution parameters of a total reward value Q,
and wherein the Thompson sampling is applied to the one or more posterior distribution parameters calculated using a short-term reward value, r, and a long term reward value, R; 
This/These limitation(s) require using Thompson sampling to select content elements, calculating posterior distribution parameters of a total reward value, and applying Thompson sampling to the posterior distribution parameters calculated using a short term reward value and a long term reward value. These steps fall within the mathematical concept grouping of abstract ideas. Thus, Claim 1 recites an abstract idea. 
The abstract idea of claim 1 is not integrated into a practical application because the additional elements recited in claim 1 are: 
a memory having instructions stored thereon, and a processor that reads the instructions to:
receive a plurality of content elements for presentation in at least one content container;
a trained selection model
generate an interface including the selected one of the plurality of content elements; and provide the interface for display.

Instructions to apply the abstract idea on generic computer components (a memory having instructions stored thereon, and a processor that reads the instructions to:) do not represent a practical application of the abstract idea (see MPEP 2106.05(f)). Further the recitation of:
receive a plurality of content elements for presentation in at least one content container;
generate an interface including the selected one of the plurality of content elements; and provide the interface for display.
amount to recitation of insignificant extra-solution activity. See MPEP 2106.05(g).
Finally, generally linking the abstract idea to a particular technological environment or field of use (a trained selection model) cannot integrate the abstract idea into a practical application (see MPEP 2106.05(h)), this additional element merely specifies that the above mental process/mathematical concept steps are performed with a trained selection model. Therefore, Claim 1 is directed to an abstract idea.

Finally, the additional elements, taken alone or in combination, do not represent significantly more than the abstract idea itself. Generally linking the abstract idea to a field of use or technological environment (a trained selection model) does not provide an inventive concept (see MPEP 2106.05(h)) and using generic computer components (a memory having instructions stored thereon, and a processor that reads the instructions to:) to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer which cannot provide an inventive concept. 
Further, the following recitation of insignificant extra-solution activity (receive a plurality of content elements for presentation in at least one content container;) amounts to insignificant extra-solution activity of data gathering, see MPEP 2106.06(g). Further, MPEP 2106(d)(II) notes the following, "The courts have recognized the following computer functions as well-understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity...i. Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) (using a telephone for image transmission); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network); ". Accordingly, the additional element does not integrate the abstract idea into a practical application because the recitation of insignificant extra solution activity is well-understood, routine, and conventional.
According to MPEP 2106.05(d)(1), "A factual determination is required to support a conclusion that an additional element (or combination of additional elements) is well-understood, routine, conventional activity. Berkheimer v. HP, Inc., 881 F.3d 1360,1368, 125 USPQ2d 1649,1654 (Fed. Cir. 2018)...The required factual determination must be expressly supported in writing, as discussed in MPEP § 2106.07(a). Appropriate forms of support include one or more of the following: ...(c) A citation to a publication that demonstrates the well-understood, routine, conventional nature of the additional element(s)." In accordance with the MPEP, the following factual determination is based on the technical publication: Sherrick, "An Introduction to Graphical User Interfaces and Their Use by CITIS", July 1992, Sherrick in pg. 3 “The term GUI is commonly used to refer to any of the components of a graphical user interface between a user and his computing environment. For this paper, a GUI is characterized as consisting of at least one, and usually more, of the following components: 1 . Display Manager Program is often referred to as a "desktop" display. This interface handles the interaction between the user and computing services. It provides control of how applications are arranged and rearranged on the screen, how the user migrates between applications, and how applications communicate with each other. The desktop display is the overall "look and feel" of the system, or the user's view of the computer environment.” discloses that GUIs usually have an interface that handles interaction between the user and the computing service, including how applications (content) are arranged and that GUIs are used to display elements to a user thus rendering “generate an interface including the selected one of the plurality of content elements; and provide the interface for display.” in claim 1 routine and conventional. As such, the insignificant extra-solution activity is considered well-understood, routine, and conventional. Therefore, Claim 1 is subject-matter ineligible.

Regarding Claim 2, 
Claim 2 is dependent on claim 1, and only includes additional limitations that further limit the mental process of content selection and remains a mental process (wherein the plurality of content elements are selected based on a received persona). This/These claim(s) do not recite any additional elements beyond those recited in claim 1, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim(s) thus remain subject-matter ineligible.

Regarding Claim 3,
Claim 3 depends on claim 1, and only includes an additional element that amounts to recitation of insignificant extra-solution activity (wherein the trained selection model is trained using a plurality of prior impressions). According to MPEP 2106.05(d)(1), "A factual determination is required to support a conclusion that an additional element (or combination of additional elements) is well-understood, routine, conventional activity. Berkheimer v. HP, Inc., 881 F.3d 1360,1368, 125 USPQ2d 1649,1654 (Fed. Cir. 2018)...The required factual determination must be expressly supported in writing, as discussed in MPEP § 2106.07(a). Appropriate forms of support include one or more of the following: ...(c) A citation to a publication that demonstrates the well-understood, routine, conventional nature of the additional element(s)." In accordance with the MPEP, the following factual determination is based on the technical publication: Assem Aly Salama et al., (US 20190050754 A1), February 2019. Assem Aly Salama et al in Para [0005] “A typical offline machine learning process involves two phases: (i) training a model, using training or historical data, and (ii) scoring using the trained model from the previous step on real-life or future data.” discloses that a typical machine learning process involves training a model using training or historical data (prior impressions) thus rendering “wherein the trained selection model is trained using a plurality of prior impressions” in claim 3 routine and conventional. As such, the insignificant extra-solution activity is considered well-understood, routine, and conventional. 

Regarding Claim 4, 
Claim 4 is dependent on claim 1, and only includes additional limitations drawn to mathematical concepts (wherein the trained selection model implements a state-action-reward-state-action (SARSA) process modified to use Thompson sampling.). This/These claim(s) do not recite any additional elements beyond those recited in claim 1, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim(s) thus remain subject-matter ineligible.

Regarding Claim 7, 
Claim 7 is dependent on claim 1, and only includes additional limitations drawn to mental processes (identify a state and an action taken through the interface including the selected one of the plurality of content elements). Claim 7 includes an additional element that amounts to recitation of insignificant extra-solution activity (receive an updated trained selection model having a reward function updated based on the state and the action.) which amounts to insignificant extra-solution activity of data gathering, See MPEP 2106.05(g). Further, MPEP 2106(d)(II) notes the following, "The courts have recognized the following computer functions as well-understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity...i. Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) (using a telephone for image transmission); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network);". Accordingly, the additional element does not integrate the abstract idea into a practical application because the recitation of insignificant extra solution activity is well-understood, routine, and conventional. The claim thus remains subject-matter ineligible.

Regarding Claim 8, 
Claim 8 is directed to A non-transitory computer readable medium…, which is directed to an article of manufacture, one of the statutory categories.

Claim 8 recites the following limitations: 
selecting at least one of a plurality of content elements for inclusion in the interface,
This/These limitation(s) fall within the mental process grouping of abstract ideas that can be performed in the human mind, or by a human with pencil and paper. Claim 8 recites additional limitations: 
wherein the at least one of the plurality of content elements is selected using Thompson sampling,
…calculates one or more posterior distribution parameters of a total reward value Q,
and wherein the Thompson sampling is applied to the one or more posterior distribution parameters calculated using a short-term reward value, r, and a long term reward value, R; and
This/These limitation(s) require using Thompson sampling to select content elements, calculating posterior distribution parameters of a total reward value, and applying Thompson sampling to the posterior distribution parameters calculated using a short term reward value and a long term reward value. These steps fall within the mathematical concept grouping of abstract ideas. Thus, Claim 8 recites an abstract idea. 
The abstract idea of claim 8 is not integrated into a practical application because the additional elements recited in claim 8 are: 
A non-transitory computer readable medium having instructions stored thereon, wherein the instructions, when executed by a processor cause a device to perform operations comprising
receiving a request for an interface, wherein the request includes a user persona;
the trained selection model
generating an interface including the selected one of the plurality of content elements; and providing the interface for display.

Instructions to apply the abstract idea on generic computer components (A non-transitory computer readable medium having instructions stored thereon, wherein the instructions, when executed by a processor cause a device to perform operations comprising) do not represent a practical application of the abstract idea (see MPEP 2106.05(f)). Further the recitation of:
receiving a request for an interface, wherein the request includes a user persona;
generating an interface including the selected one of the plurality of content elements; and providing the interface for display.
amount to recitation of insignificant extra-solution activity. See MPEP 2106.05(g).
Finally, generally linking the abstract idea to a particular technological environment or field of use (the trained selection model) cannot integrate the abstract idea into a practical application (see MPEP 2106.05(h)), this additional element merely specifies that the above mental process/mathematical concept steps are performed with a trained selection model. Therefore, Claim 8 is directed to an abstract idea.

Finally, the additional elements, taken alone or in combination, do not represent significantly more than the abstract idea itself. Generally linking the abstract idea to a field of use or technological environment (the trained selection model) does not provide an inventive concept (see MPEP 2106.05(h)) and using generic computer components (A non-transitory computer readable medium having instructions stored thereon, wherein the instructions, when executed by a processor cause a device to perform operations comprising) to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer which cannot provide an inventive concept. 
Further, the following recitation of insignificant extra-solution activity (receiving a request for an interface, wherein the request includes a user persona;) amounts to insignificant extra-solution activity of data gathering, see MPEP 2106.06(g). Further, MPEP 2106(d)(II) notes the following, "The courts have recognized the following computer functions as well-understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity...i. Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) (using a telephone for image transmission); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network); ". Accordingly, the additional element does not integrate the abstract idea into a practical application because the recitation of insignificant extra solution activity is well-understood, routine, and conventional.
According to MPEP 2106.05(d)(1), "A factual determination is required to support a conclusion that an additional element (or combination of additional elements) is well-understood, routine, conventional activity. Berkheimer v. HP, Inc., 881 F.3d 1360,1368, 125 USPQ2d 1649,1654 (Fed. Cir. 2018)...The required factual determination must be expressly supported in writing, as discussed in MPEP § 2106.07(a). Appropriate forms of support include one or more of the following: ...(c) A citation to a publication that demonstrates the well-understood, routine, conventional nature of the additional element(s)." In accordance with the MPEP, the following factual determination is based on the technical publication: Sherrick, "An Introduction to Graphical User Interfaces and Their Use by CITIS", July 1992, Sherrick in pg. 3 “The term GUI is commonly used to refer to any of the components of a graphical user interface between a user and his computing environment. For this paper, a GUI is characterized as consisting of at least one, and usually more, of the following components: 1 . Display Manager Program is often referred to as a "desktop" display. This interface handles the interaction between the user and computing services. It provides control of how applications are arranged and rearranged on the screen, how the user migrates between applications, and how applications communicate with each other. The desktop display is the overall "look and feel" of the system, or the user's view of the computer environment.” discloses that GUIs usually have an interface that handles interaction between the user and the computing service, including how applications (content) are arranged and that GUIs are used to display elements to a user thus rendering “generating an interface including the selected one of the plurality of content elements; and providing the interface for display.” in claim 8 routine and conventional. As such, the insignificant extra-solution activity is considered well-understood, routine, and conventional. Therefore, Claim 8 is subject-matter ineligible.

Regarding Claim 9,
Claim 9 depends on claim 8, and only includes an additional element that amounts to recitation of insignificant extra-solution activity (wherein the Thompson sampling is implemented by a machine learning model trained using reinforcement learning). According to MPEP 2106.05(d)(1), "A factual determination is required to support a conclusion that an additional element (or combination of additional elements) is well-understood, routine, conventional activity. Berkheimer v. HP, Inc., 881 F.3d 1360,1368, 125 USPQ2d 1649,1654 (Fed. Cir. 2018)...The required factual determination must be expressly supported in writing, as discussed in MPEP § 2106.07(a). Appropriate forms of support include one or more of the following: ...(c) A citation to a publication that demonstrates the well-understood, routine, conventional nature of the additional element(s)." In accordance with the MPEP, the following factual determination is based on the technical publication: Osband et al., (WO 2017004626 A1), January 2017. Osband et al in Para [0004] “Another common exploration strategy is inspired by Thompson sampling. In a Thompson sampling strategy there is some notion of uncertainty. However, a distribution of the maintained over the possible values from the dataset and the system is explored by randomly selecting a policy according to the probability that the selected policy is the optimal policy.” and Para [0040]: “Perhaps the oldest heuristic for balancing exploration with exploitation is given by Thompson sampling. Thompson sampling is often referred to a bandit algorithm and takes a single sample from the posterior at every time step and chooses the action which is optimal for that time step. To apply the Thompson sampling principle to reinforcement learning, a system samples a value function from its posterior.” discloses that Thompson sampling is an old and common heuristic and that Thompson sampling can be applied to reinforcement learning thus rendering “wherein the Thompson sampling is implemented by a machine learning model trained using reinforcement learning” in claim 9 routine and conventional. As such, the insignificant extra-solution activity is considered well-understood, routine, and conventional.

Regarding Claim 10,
Claim 10 depends on claim 9, and only includes an additional element that amounts to recitation of insignificant extra-solution activity (wherein the machine learning model is trained using a plurality of prior impressions). According to MPEP 2106.05(d)(1), "A factual determination is required to support a conclusion that an additional element (or combination of additional elements) is well-understood, routine, conventional activity. Berkheimer v. HP, Inc., 881 F.3d 1360,1368, 125 USPQ2d 1649,1654 (Fed. Cir. 2018)...The required factual determination must be expressly supported in writing, as discussed in MPEP § 2106.07(a). Appropriate forms of support include one or more of the following: ...(c) A citation to a publication that demonstrates the well-understood, routine, conventional nature of the additional element(s)." In accordance with the MPEP, the following factual determination is based on the technical publication: Assem Aly Salama et al., (US 20190050754 A1), February 2019. Assem Aly Salama et al in Para [0005] “A typical offline machine learning process involves two phases: (i) training a model, using training or historical data, and (ii) scoring using the trained model from the previous step on real-life or future data.” discloses that a typical machine learning process involves training a model using training or historical data (prior impressions) thus rendering “wherein the machine learning model is trained using a plurality of prior impressions” in claim 10 routine and conventional. As such, the insignificant extra-solution activity is considered well-understood, routine, and conventional.

Regarding Claim 11, 
Claim 11 is dependent on claim 9, and only includes additional limitations drawn to mathematical concepts (wherein the trained selection model implements a state-action-reward-state-action (SARSA) process modified to use Thompson sampling.). This/These claim(s) do not recite any additional elements beyond those recited in claim 9, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim(s) thus remain subject-matter ineligible.

Regarding Claim 14,
Claim 14 is directed to A computer-implemented method… which is directed to a process, one of the statutory categories. Claim 14 recites: A computer-implemented method, comprising which executes a process similar to the processes executed by the system of claim 1. As performing an abstract idea on a generic computer component cannot integrate the abstract idea into a practical application and cannot provide an inventive concept, Claim 14 remains subject matter ineligible and is rejected with the same rationale applied against claim 1.

Regarding Claim 15, 
Claim 15 is/are dependent on claim 14 and recite limitations similar to the limitations recited in claim 2 therefore is rejected with the same rationale applied to claim 2. This claim does not recite any additional elements beyond those recited in independent claim 14 or claim 2, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim thus remains subject-matter ineligible.

Regarding Claim 16, 
Claim 16 is/are dependent on claim 14 and recite limitations similar to the limitations recited in claim 3 therefore is rejected with the same rationale applied to claim 3. This claim does not recite any additional elements beyond those recited in independent claim 14 or claim 3, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim thus remains subject-matter ineligible.

Regarding Claim 17, 
Claim 17 is/are dependent on claim 14 and recite limitations similar to the limitations recited in claim 4 therefore is rejected with the same rationale applied to claim 4. This claim does not recite any additional elements beyond those recited in independent claim 14 or claim 4, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim thus remains subject-matter ineligible.

Regarding Claim 20, 
Claim 20 is/are dependent on claim 14 and recite limitations similar to the limitations recited in claim 7 therefore is rejected with the same rationale applied to claim 7. This claim does not recite any additional elements beyond those recited in independent claim 14 or claim 7, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim thus remains subject-matter ineligible.

Regarding Claim 21,
Claim 21 depends on claim 1, and only includes an additional element that amounts to recitation of insignificant extra-solution activity (wherein the Thompson sampling is implemented by a machine learning model trained using reinforcement learning). According to MPEP 2106.05(d)(1), "A factual determination is required to support a conclusion that an additional element (or combination of additional elements) is well-understood, routine, conventional activity. Berkheimer v. HP, Inc., 881 F.3d 1360,1368, 125 USPQ2d 1649,1654 (Fed. Cir. 2018)...The required factual determination must be expressly supported in writing, as discussed in MPEP § 2106.07(a). Appropriate forms of support include one or more of the following: ...(c) A citation to a publication that demonstrates the well-understood, routine, conventional nature of the additional element(s)." In accordance with the MPEP, the following factual determination is based on the technical publication: Osband et al., (WO 2017004626 A1), January 2017. Osband et al in Para [0004] “Another common exploration strategy is inspired by Thompson sampling. In a Thompson sampling strategy there is some notion of uncertainty. However, a distribution of the maintained over the possible values from the dataset and the system is explored by randomly selecting a policy according to the probability that the selected policy is the optimal policy.” and Para [0040]: “Perhaps the oldest heuristic for balancing exploration with exploitation is given by Thompson sampling. Thompson sampling is often referred to a bandit algorithm and takes a single sample from the posterior at every time step and chooses the action which is optimal for that time step. To apply the Thompson sampling principle to reinforcement learning, a system samples a value function from its posterior.” discloses that Thompson sampling is an old and common heuristic and that Thompson sampling can be applied to reinforcement learning thus rendering “wherein the Thompson sampling is implemented by a machine learning model trained using reinforcement learning” in claim 9 routine and conventional. As such, the insignificant extra-solution activity is considered well-understood, routine, and conventional.

Regarding Claim 22, 
Claim 22 is dependent on claim 21, and only includes an additional element that is directed to generally linking the use of a judicial exception to a particular technological environment or field of use (wherein the machine learning model is a neural network.) Generally linking the use of a judicial exception to a particular technological environment cannot integrate the abstract idea into a practical application and does not amount to significantly more, see MPEP 2106.05(h). The claim(s) thus remain subject-matter ineligible.

Regarding Claim 23,
Claim 23 depends on claim 8, and only includes an additional element that amounts to recitation of insignificant extra-solution activity (wherein the Thompson sampling is implemented by a machine learning model trained using reinforcement learning). According to MPEP 2106.05(d)(1), "A factual determination is required to support a conclusion that an additional element (or combination of additional elements) is well-understood, routine, conventional activity. Berkheimer v. HP, Inc., 881 F.3d 1360,1368, 125 USPQ2d 1649,1654 (Fed. Cir. 2018)...The required factual determination must be expressly supported in writing, as discussed in MPEP § 2106.07(a). Appropriate forms of support include one or more of the following: ...(c) A citation to a publication that demonstrates the well-understood, routine, conventional nature of the additional element(s)." In accordance with the MPEP, the following factual determination is based on the technical publication: Osband et al., (WO 2017004626 A1), January 2017. Osband et al in Para [0004] “Another common exploration strategy is inspired by Thompson sampling. In a Thompson sampling strategy there is some notion of uncertainty. However, a distribution of the maintained over the possible values from the dataset and the system is explored by randomly selecting a policy according to the probability that the selected policy is the optimal policy.” and Para [0040]: “Perhaps the oldest heuristic for balancing exploration with exploitation is given by Thompson sampling. Thompson sampling is often referred to a bandit algorithm and takes a single sample from the posterior at every time step and chooses the action which is optimal for that time step. To apply the Thompson sampling principle to reinforcement learning, a system samples a value function from its posterior.” discloses that Thompson sampling is an old and common heuristic and that Thompson sampling can be applied to reinforcement learning thus rendering “wherein the Thompson sampling is implemented by a machine learning model trained using reinforcement learning” in claim 9 routine and conventional. As such, the insignificant extra-solution activity is considered well-understood, routine, and conventional.

Regarding Claim 24, 
Claim 24 is dependent on claim 23, and only includes an additional element that is directed to generally linking the use of a judicial exception to a particular technological environment or field of use (wherein the machine learning model is a neural network.) Generally linking the use of a judicial exception to a particular technological environment cannot integrate the abstract idea into a practical application and does not amount to significantly more, see MPEP 2106.05(h). The claim(s) thus remain subject-matter ineligible.

Regarding Claim 25, 
Claim 25 is/are dependent on claim 14 and recite limitations similar to the limitations recited in claim 21 therefore is rejected with the same rationale applied to claim 21. This claim does not recite any additional elements beyond those recited in claim 21, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim thus remains subject-matter ineligible.

Regarding Claim 26, 
Claim 26 is/are dependent on claim 25 and recite limitations similar to the limitations recited in claim 22 therefore is rejected with the same rationale applied to claim 22. This claim does not recite any additional elements beyond those recited in claim 22, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim thus remains subject-matter ineligible.



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 7-11, 14-17, and 20-26 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US 2018/0121964 A1) in view of Montgomery et al. (US 20210004868 A1), further in view of Chapelle et al. (“An Empirical Evaluation of Thompson Sampling”). 

Regarding Claim 1, 
Zhang teaches: 
A system for generating item recommendations, comprising a memory having instructions stored thereon, and a processor that reads instructions to: (Para [0006]: “An online system receives multiple candidate content item components ("candidate components") of at least one type (e.g., title, image, body text, call to action, video, etc.) from a content-providing user of the online system (e.g., an advertiser) for including in a content item to be presented to viewing users of the online system.” teaches a system for content selection and presentation; Para [0118]: “Embodiments also may relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, and/or it may comprise a general-purpose computing device selectively activated or reconfigured by a computer program stored in the computer.” teaches a computing device that contains a memory and processor)
receive a plurality of content elements for presentation in at least one content container; (Para [0008]: “The optimal content item is included in a content selection process (e.g., an auction) that selects one or more content items for presentation to the subject user. For example, the online system includes an optimal advertisement in an advertisement auction that ranks the optimal advertisement among one or more additional advertisements based on a bid amount associated with each advertisement and selects a highest ranked advertisement for presentation to the subject user.” teaches receiving a plurality of content elements that can be selected for presentation as an optimal content item (content container); Fig. 5 and Para [0007]: “Upon identifying an opportunity to present a content item to a subject user of the online system (i.e., an "impression" opportunity), the online system dynamically generates an optimal content item ( e.g., an optimal advertisement) for presentation to the subject user using one or more of the candidate components.” teaches that the optimal content item is a content container because it contains the selected content items)
select one of the plurality of content elements for presentation in the at least one content container, (Fig. 5 and Para [0008]: “The optimal content item is included in a content selection process (e.g., an auction) that selects one or more content items for presentation to the subject user.” teaches selecting one of a plurality of content items to be used in an optimal content item (content container) which is presented to the user) 
wherein the one of the plurality of content elements is selected by a trained selection model using Thompson sampling, (Para [0009]: “The online system selects components to include in the optimal content item to be presented to the subject user based on an affinity score of the subject user predicted for each candidate component, in which an affinity score for a candidate component indicates the subject user's predicted affinity for the candidate component. For example, the online system predicts affinity scores of the subject user for candidate components and selects the candidate components that are associated with the highest affinity scores for inclusion in the optimal content item ( e.g., by ranking multiple candidate components of various types based on their affinity scores and selecting the highest ranked candidate component of each type).” teaches that content items are selected as the optimal content item based on an affinity score of the user with the respective content item (candidate component); Para [0013]: “In some embodiments, the affinity score of the subject user for a candidate component may be predicted using a machine-learned model. The online system may train the machine-learned model to predict an affinity score of the subject user for a candidate component using affinity scores of viewing users of the online system for the candidate component, in which the viewing users have at least a threshold measure of similarity to the subject user (e.g., based on attributes shared by the subject user and the viewing users). For example, the online system trains the machine-learned model using a set of affinity scores of viewing users of the online system for each candidate component included in "training content items" presented to the viewing users and information describing the ages and genders of the viewing users.” teaches using a trained machine learning model to predict affinity scores of content items to be potentially used for selection and presentation to the user; Para [0017]: “In addition to random selection, the historical performance information also may be associated with training content items generated from candidate components that are selected using a heuristic (e.g., Thompson sampling). For example, once the training content items that include randomly selected candidate components have achieved at least 1,000 impressions, the online system generates training content items that include candidate components that are selected using Thompson sampling. The online system may use Thompson sampling to select each candidate component to include in a training content item based on a distribution of affinity scores for each candidate component, in which the distribution of affinity scores for a candidate component is inversely proportional to the amount of data for the component (i.e., the number of impressions achieved by training content items including the component).” teaches that the machine learning model uses Thompson sampling for selecting content items)
generate an interface including the selected one of the plurality of content elements; and provide the interface for display. (Para [0008]: “The online system may then present the selected content item to the subject user (e.g., in a display area of a client device associated with the subject user).” teaches presenting the selected content item to a user through a display (generating an interface on the display for the user to view the selected content item))

Zhang does not appear to explicitly teach: 
wherein the trained selection model calculates one or more posterior distribution parameters of a total reward value Q, and wherein the Thompson sampling is applied to the one or more posterior distribution parameters calculated using a short-term reward value, r, and a long term reward value, R; and

However, Montgomery teaches: 
wherein the trained selection model calculates one or more [policy] parameters of a total reward value Q, and wherein the Thompson sampling is applied to the one or more [policy] parameters calculated using a short-term reward value, r, and a long term reward value, R; and (Para [0044-0045]: “In another aspect of the disclosure, the selection algorithm may be a machine learning model, such as an analytical model, a neural network, a reinforcement learning model, or, generally, a model that takes inputs ( e.g., a feature set) and outputs a target (e.g., a target position) based on a trained function. The function may be trained using a training set of labeled data, while deployed in an environment (simulated or real), or while deployed in parallel to a different model to observe how the function would have performed if it was deployed. Specifically, in this aspect of the disclosure, the selection algorithm may be a Thompson sampling reinforcement learning model. The Thompson sampling reinforcement learning model may include an agent that takes one of one or more action(s) (e.g., from an action function) in an environment to maximize an expected reward (based on a reward function) based on the modeled state of the environment (which represents the environment and the agent in that environment, as updated based on the agent's action and other changes in the environment). The agent then may receive the actual reward and the new state in response to the chosen action, and makes another action. Generally, the agent selects actions according to a policy. The policy may be updated according to the history of actions, states, and rewards.” teaches that the trained selection algorithm is a Thompson sampling reinforcement algorithm that can include an agent that selects actions according to a policy in order to maximize an expected reward (total reward value Q). The policy is updated according to a history received rewards, therefore the policy is determined based on a most recent reward (short-term reward value) and any reward received in the past (long term reward value))
Zhang and Montgomery are analogous art because they are directed to systems using Thompson sampling to present content to users.  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to replace Zhang’s Thompson sampling algorithm with Montgomery’s Thompson sampling reinforcement algorithm using a history of states and rewards with a motivation to learn more about [content] space, while exploiting known profitable regions, as well as adapting to changing [content] environments (Montgomery, Para [0020]).)

The combination of Zhang and Montgomery does not appear to explicitly teach: 
[that the policy parameters include] posterior distribution parameters

However, Chapelle teaches: 
[that the policy parameters include] posterior distribution parameters (Pages 1-2, Section 2: 

    PNG
    media_image1.png
    189
    819
    media_image1.png
    Greyscale
 

    PNG
    media_image2.png
    329
    823
    media_image2.png
    Greyscale
 teaches that randomly selecting an action a according to its probability of being optimal is a policy that selects actions to maximize the expected/cumulative reward and uses the posterior distribution parameters in this calculation, therefore the policy parameters include posterior distribution parameters)
Zhang, Montgomery, and Chapelle are analogous art because they are directed to using Thomson sampling. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the posterior distribution parameters of Chapelle with the policy parameters of Zhang/Montgomery with a motivation to implement the Thompson sampling algorithm efficiently (Chapelle, Page 2).

Regarding Claim 2, 
The combination of Zhang, Montgomery, and Chapelle teaches The system of claim 1,
Zhang further teaches: 
wherein the plurality of content elements are selected based on a received persona. (Para [0069]: “The content selection module 255 selects (e.g., as shown in step 365 of FIG. 3) one or more content items for communication to a client device 110 to be presented to a user. Content items eligible for presentation to the user are retrieved from the content store 210, from the ad request store 230, or from another source by the content selection module 255, which selects one or more of the content items for presentation to the user. A content item eligible for presentation to the user is a content item associated with at least a threshold number of targeting criteria satisfied by characteristics of the user or is a content item that is not associated with targeting criteria.” teaches selecting potential content items to present to the user based on targeting criteria; Para [0004]: “For example, targeting criteria are used to identify users associated with specific user profile information satisfying at least one of the targeting criteria. Attributes specified by targeting criteria are usually associated with online system users who are likely to have an interest in content items associated with the targeting criteria or who are likely to find such content items relevant. For example, content items associated with the board game chess may be associated with targeting criteria describing online system users who have expressed an interest in board games (e.g., users who have included playing board games as a hobby in their profile information, users who have downloaded game applications for board games in the online system, etc).” teaches that the targeting criteria is based on a user’s characteristics and actions (persona))

Regarding Claim 3, 
The combination of Zhang, Montgomery, and Chapelle teaches The system of claim 1,
Zhang further teaches:
wherein the trained selection model is trained using a plurality of prior impressions. (Para [0016]: “In some embodiments, the historical performance information used to train the machine-learned model is associated with training content items generated from randomly selected candidate components, in which the training content items have achieved at least a threshold number of impressions (e.g., 1,000 impressions). For example, if the content-providing user provides 13 different candidate image components to the online system, the online system randomly selects one of the candidate image components to include in a training content item that is presented to a viewing user of the online system and repeats this process until at least a threshold number of impressions have been achieved for each candidate image component. In this example, performance information associated with each impression of the training content items is used to train the machine-learned model.” teaches selecting a candidate component for training the model based on the plurality of prior impressions and whether or not the training content item generates a threshold number of impressions)

Regarding Claim 4, 
The combination of Zhang, Montgomery, and Chapelle teaches The system of claim 1,
Montgomery further teaches: 
wherein the trained selection model implements a state-action-reward-state-action (SARSA) process modified to use Thompson sampling. (Para [0045]: “The Thompson sampling reinforcement learning model may include an agent that takes one of one or more action(s) (e.g., from an action function) in an environment to maximize an expected reward (based on a reward function) based on the modeled state of the environment (which represents the environment and the agent in that environment, as updated based on the agent's action and other changes in the environment). The agent then may receive the actual reward and the new state in response to the chosen action, and makes another action. Generally, the agent selects actions according to a policy. The policy may be updated according to the history of actions, states, and rewards.” teaches an agent that performs an action to maximize a reward based on the modeled state; the agent then may receive a reward and new state and make another action, this teaches a state-action-reward-state-action process used by an agent of a Thompson sampling reinforcement model)
The combination of claim 1 has already incorporated the Thompson sampling reinforcement algorithm using a history of states and rewards, therefore already incorporating the details of the SARSA process required by claim 4. 

Regarding Claim 7, 
The combination of Zhang, Montgomery, and Chapelle teaches The system of claim 1,
Zhang further teaches: 
wherein the processor further reads the instructions to: [perform Thompson sampling] through the interface including the selected one of the plurality of content elements (Para [0009]: “The online system selects components to include in the optimal content item to be presented to the subject user based on an affinity score of the subject user predicted for each candidate component, in which an affinity score for a candidate component indicates the subject user's predicted affinity for the candidate component. For example, the online system predicts affinity scores of the subject user for candidate components and selects the candidate components that are associated with the highest affinity scores for inclusion in the optimal content item ( e.g., by ranking multiple candidate components of various types based on their affinity scores and selecting the highest ranked candidate component of each type).” teaches that content items are selected as the optimal content item based on an affinity score of the user with the respective content item (candidate component); Para [0013]: “In some embodiments, the affinity score of the subject user for a candidate component may be predicted using a machine-learned model. The online system may train the machine-learned model to predict an affinity score of the subject user for a candidate component using affinity scores of viewing users of the online system for the candidate component, in which the viewing users have at least a threshold measure of similarity to the subject user (e.g., based on attributes shared by the subject user and the viewing users). For example, the online system trains the machine-learned model using a set of affinity scores of viewing users of the online system for each candidate component included in "training content items" presented to the viewing users and information describing the ages and genders of the viewing users.” teaches using a trained machine learning model to predict affinity scores of content items to be potentially used for selection and presentation to the user; Para [0017]: “In addition to random selection, the historical performance information also may be associated with training content items generated from candidate components that are selected using a heuristic (e.g., Thompson sampling). For example, once the training content items that include randomly selected candidate components have achieved at least 1,000 impressions, the online system generates training content items that include candidate components that are selected using Thompson sampling. The online system may use Thompson sampling to select each candidate component to include in a training content item based on a distribution of affinity scores for each candidate component, in which the distribution of affinity scores for a candidate component is inversely proportional to the amount of data for the component (i.e., the number of impressions achieved by training content items including the component).” teaches that the machine learning model uses Thompson sampling for selecting content items)

Montgomery further teaches: 
[perform Thompson sampling to] identify a state and an action taken (Para [0045]: “The Thompson sampling reinforcement learning model may include an agent that takes one of one or more action(s) (e.g., from an action function) in an environment to maximize an expected reward (based on a reward function) based on the modeled state of the environment (which represents the environment and the agent in that environment, as updated based on the agent's action and other changes in the environment). The agent then may receive the actual reward and the new state in response to the chosen action, and makes another action. Generally, the agent selects actions according to a policy. The policy may be updated according to the history of actions, states, and rewards.”) teaches using a Thompson sampling reinforcement model that uses an agent identifies a modeled state and can make actions

receive an updated trained selection model having a reward function updated based on the state and the action. (Para [0045]: “The Thompson sampling reinforcement learning model may include an agent that takes one of one or more action(s) (e.g., from an action function) in an environment to maximize an expected reward (based on a reward function) based on the modeled state of the environment (which represents the environment and the agent in that environment, as updated based on the agent's action and other changes in the environment). The agent then may receive the actual reward and the new state in response to the chosen action, and makes another action. Generally, the agent selects actions according to a policy. The policy may be updated according to the history of actions, states, and rewards.” teaches that the Thompson sampling model’s expected reward is based on a reward function and the policy for the Thompson sampling model can be updated according to the history of actions, states, and rewards)
The combination of claim 1 has already incorporated the Thompson sampling reinforcement algorithm using a history of states and rewards, therefore already incorporating the details of the states, actions, and reward function required by claim 7.

Regarding Claim 8, 
Zhang teaches:
A non-transitory computer readable medium having instructions stored thereon, wherein the instructions, when executed by a processor cause a device to perform operations comprising: (Para [0117]: “In one embodiment, a software module is implemented with a computer program product comprising a computer-readable medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.” teaches a computer readable medium containing instructions that can be executed by the processor)
receiving a request for an interface, wherein the request includes a user persona; (Para [0007]: “Upon identifying an opportunity to present a content item to a subject user of the online system (i.e., an "impression" opportunity), the online system dynamically generates an optimal content item (e.g., an optimal advertisement) for presentation to the subject user using one or more of the candidate components.” teaches receiving an opportunity to present content to a user (request for an interface) and generating an optimal content item to present to the user; Para [0069]: “The content selection module 255 selects (e.g., as shown in step 365 of FIG. 3) one or more content items for communication to a client device 110 to be presented to a user. Content items eligible for presentation to the user are retrieved from the content store 210, from the ad request store 230, or from another source by the content selection module 255, which selects one or more of the content items for presentation to the user. A content item eligible for presentation to the user is a content item associated with at least a threshold number of targeting criteria satisfied by characteristics of the user or is a content item that is not associated with targeting criteria.” teaches presenting content items to the user based on targeting criteria; Para [0004]: “For example, targeting criteria are used to identify users associated with specific user profile information satisfying at least one of the targeting criteria. Attributes specified by targeting criteria are usually associated with online system users who are likely to have an interest in content items associated with the targeting criteria or who are likely to find such content items relevant. For example, content items associated with the board game chess may be associated with targeting criteria describing online system users who have expressed an interest in board games (e.g., users who have included playing board games as a hobby in their profile information, users who have downloaded game applications for board games in the online system, etc).” teaches that the targeting criteria is based on a user’s characteristics and actions (persona))
selecting at least one of a plurality of content elements for inclusion in the interface, wherein the at least one of the plurality of content elements is selected using Thompson sampling; and (Fig. 5 and Para [0008]: “The optimal content item is included in a content selection process (e.g., an auction) that selects one or more content items for presentation to the subject user.” teaches selecting one of a plurality of content items to be used in an optimal content item (content container) which is presented to the user; Para [0017]: “In addition to random selection, the historical performance information also may be associated with training content items generated from candidate components that are selected using a heuristic (e.g., Thompson sampling). For example, once the training content items that include randomly selected candidate components have achieved at least 1,000 impressions, the online system generates training content items that include candidate components that are selected using Thompson sampling. The online system may use Thompson sampling to select each candidate component to include in a training content item based on a distribution of affinity scores for each candidate component, in which the distribution of affinity scores for a candidate component is inversely proportional to the amount of data for the component (i.e., the number of impressions achieved by training content items including the component).” teaches using Thompson sampling to select content elements)
generating an interface including the selected at least one of the plurality of content elements; and providing the interface for display. (Para [0008]: “The online system may then present the selected content item to the subject user (e.g., in a display area of a client device associated with the subject user).” teaches presenting the selected content item to a user through a display (generating an interface on the display for the user to view the selected content item))

Zhang does not appear to explicitly teach: 
wherein the trained selection model calculates one or more posterior distribution parameters of a total reward value Q, and wherein the Thompson sampling is applied to the one or more posterior distribution parameters calculated using a short-term reward value, r, and a long term reward value, R; and

However, Montgomery teaches: 
wherein the trained selection model calculates one or more [policy] parameters of a total reward value Q, and wherein the Thompson sampling is applied to the one or more [policy] parameters calculated using a short-term reward value, r, and a long term reward value, R; and (Para [0044-0045]: “In another aspect of the disclosure, the selection algorithm may be a machine learning model, such as an analytical model, a neural network, a reinforcement learning model, or, generally, a model that takes inputs ( e.g., a feature set) and outputs a target (e.g., a target position) based on a trained function. The function may be trained using a training set of labeled data, while deployed in an environment (simulated or real), or while deployed in parallel to a different model to observe how the function would have performed if it was deployed. Specifically, in this aspect of the disclosure, the selection algorithm may be a Thompson sampling reinforcement learning model. The Thompson sampling reinforcement learning model may include an agent that takes one of one or more action(s) (e.g., from an action function) in an environment to maximize an expected reward (based on a reward function) based on the modeled state of the environment (which represents the environment and the agent in that environment, as updated based on the agent's action and other changes in the environment). The agent then may receive the actual reward and the new state in response to the chosen action, and makes another action. Generally, the agent selects actions according to a policy. The policy may be updated according to the history of actions, states, and rewards.” teaches that the trained selection algorithm is a Thompson sampling reinforcement algorithm that can include an agent that selects actions according to a policy in order to maximize an expected reward (total reward value Q). The policy is updated according to received rewards, therefore the policy is determined based on a most recent reward (short-term reward value) and any reward received in the past (long term reward value))
Zhang and Montgomery are analogous art because they are directed to systems using Thompson sampling to present content to users.  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to replace Zhang’s Thompson sampling algorithm with Montgomery’s Thompson sampling reinforcement algorithm using a history of states and rewards with a motivation to learn more about [content] space, while exploiting known profitable regions, as well as adapting to changing [content] environments (Montgomery, Para [0020]).)

The combination of Zhang and Montgomery does not appear to explicitly teach: 
[that the policy parameters include] posterior distribution parameters

However, Chapelle teaches: 
[that the policy parameters include] posterior distribution parameters (Pages 1-2, Section 2: 

    PNG
    media_image1.png
    189
    819
    media_image1.png
    Greyscale
 

    PNG
    media_image2.png
    329
    823
    media_image2.png
    Greyscale
 teaches that randomly selecting an action a according to its probability of being optimal is a policy that selects actions to maximize the expected/cumulative reward and uses the posterior distribution parameters in this calculation, therefore the policy parameters include posterior distribution parameters)
Zhang, Montgomery, and Chapelle are analogous art because they are directed to using Thomson sampling. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the posterior distribution parameters of Chapelle with the policy parameters of Zhang/Montgomery with a motivation to implement the Thompson sampling algorithm efficiently (Chapelle, Page 2).


Regarding Claim 9, 
The combination of Zhang, Montgomery, and Chapelle teaches The non-transitory computer readable medium of claim 8,
Montgomery further teaches: 
wherein the Thompson sampling is implemented by a machine learning model trained using reinforcement learning. (Para [0045]: “The Thompson sampling reinforcement learning model may include an agent that takes one of one or more action(s) (e.g., from an action function) in an environment to maximize an expected reward (based on a reward function) based on the modeled state of the environment (which represents the environment and the agent in that environment, as updated based on the agent's action and other changes in the environment).” teaches that Thompson sampling is implemented using a reinforcement learning model)

The combination of claim 8 has already incorporated the Thompson sampling reinforcement algorithm using a history of states and rewards, therefore already incorporating the details of reinforcement learning required by claim 9.
 

Regarding Claim 10, 
The combination of Zhang, Montgomery, and Chapelle teaches The non-transitory computer readable medium of claim 9,
Zhang further teaches:
wherein the machine learning model is trained using a plurality of prior impressions. (Para [0016]: “In some embodiments, the historical performance information used to train the machine-learned model is associated with training content items generated from randomly selected candidate components, in which the training content items have achieved at least a threshold number of impressions (e.g., 1,000 impressions). For example, if the content-providing user provides 13 different candidate image components to the online system, the online system randomly selects one of the candidate image components to include in a training content item that is presented to a viewing user of the online system and repeats this process until at least a threshold number of impressions have been achieved for each candidate image component. In this example, performance information associated with each impression of the training content items is used to train the machine-learned model.” teaches selecting a candidate component for training the model based on the plurality of prior impressions and whether or not the training content item generates a threshold number of impressions)

Regarding Claim 11, 
The combination of Zhang, Montgomery, and Chapelle teaches The non-transitory computer readable medium of claim 9,
Montgomery further teaches: 
wherein the machine learning model implements a state-action-reward-state-action (SARSA) process modified to use Thompson sampling. (Para [0045]: “The Thompson sampling reinforcement learning model may include an agent that takes one of one or more action(s) (e.g., from an action function) in an environment to maximize an expected reward (based on a reward function) based on the modeled state of the environment (which represents the environment and the agent in that environment, as updated based on the agent's action and other changes in the environment). The agent then may receive the actual reward and the new state in response to the chosen action, and makes another action. Generally, the agent selects actions according to a policy. The policy may be updated according to the history of actions, states, and rewards.” teaches an agent that performs an action to maximize a reward based on the modeled state; the agent then may receive a reward and new state and make another action, this teaches a state-action-reward-state-action process used by an agent of a Thompson sampling reinforcement model)
The combination of claim 8 has already incorporated the Thompson sampling reinforcement algorithm using a history of states and rewards, therefore already incorporating the details of the SARSA process required by claim 11.

Regarding Claim 14, 
This claim recites A computer-implemented method, which performs a plurality of operations as recited by the system of claim 1, and has limitations that are similar to those of claim 1, thus is rejected with the same rationale applied against claim 1. 
Regarding Claim 15, 
This claim recites The computer-implemented method of claim 14, which performs a plurality of operations as recited by the system of claim 2, and has limitations that are similar to those of claim 2, thus is rejected with the same rationale applied against claim 2. 
Regarding Claim 16, 
This claim recites The computer-implemented method of claim 14, which performs a plurality of operations as recited by the system of claim 3, and has limitations that are similar to those of claim 3, thus is rejected with the same rationale applied against claim 3.

Regarding Claim 17, 
This claim recites The computer-implemented method of claim 14, which performs a plurality of operations as recited by the system of claim 4, and has limitations that are similar to those of claim 4, thus is rejected with the same rationale applied against claim 4.
Regarding Claim 20, 
This claim recites The computer-implemented method of claim 14, which performs a plurality of operations as recited by the system of claim 7, and has limitations that are similar to those of claim 7, thus is rejected with the same rationale applied against claim 7.

Regarding Claim 21, 
The combination of Zhang, Montgomery, and Chapelle teaches The system of claim 1,
Montgomery further teaches: 
wherein the Thompson sampling is implemented by a machine learning model trained using reinforcement learning. (Para [0045]: “The Thompson sampling reinforcement learning model may include an agent that takes one of one or more action(s) (e.g., from an action function) in an environment to maximize an expected reward (based on a reward function) based on the modeled state of the environment (which represents the environment and the agent in that environment, as updated based on the agent's action and other changes in the environment).” teaches that Thompson sampling is implemented using a reinforcement learning model)

The combination of claim 1 has already incorporated the Thompson sampling reinforcement algorithm using a history of states and rewards, therefore already incorporating the details of reinforcement learning required by claim 21.

Regarding Claim 22, 
The combination of Zhang, Montgomery, and Chapelle teaches The system of claim 21,
Montgomery further teaches: 
wherein the machine learning model is a neural network. (Para [0044]: “In another aspect of the disclosure, the selection algorithm may be a machine learning model, such as an analytical model, a neural network, a reinforcement learning model, or, generally, a model that takes inputs ( e.g., a feature set) and outputs a target (e.g., a target position) based on a trained function.” teaches that the machine learning model can be a neural network)
The combination of claim 1 has already incorporated the Thompson sampling reinforcement algorithm using a history of states and rewards, therefore already incorporating the details of the machine learning model required by claim 22.

Regarding Claim 23, 
The combination of Zhang, Montgomery, and Chapelle teaches The non-transitory computer readable medium of claim 8,
Montgomery further teaches: 
wherein the Thompson sampling is implemented by a machine learning model trained using reinforcement learning. (Para [0045]: “The Thompson sampling reinforcement learning model may include an agent that takes one of one or more action(s) (e.g., from an action function) in an environment to maximize an expected reward (based on a reward function) based on the modeled state of the environment (which represents the environment and the agent in that environment, as updated based on the agent's action and other changes in the environment).” teaches that Thompson sampling is implemented using a reinforcement learning model)

The combination of claim 8 has already incorporated the Thompson sampling reinforcement algorithm using a history of states and rewards, therefore already incorporating the details of reinforcement learning required by claim 23.

Regarding Claim 24, 
The combination of Zhang, Montgomery, and Chapelle teaches The computer-implemented method of claim 23,
Montgomery further teaches: 
wherein the machine learning model is a neural network. (Para [0044]: “In another aspect of the disclosure, the selection algorithm may be a machine learning model, such as an analytical model, a neural network, a reinforcement learning model, or, generally, a model that takes inputs ( e.g., a feature set) and outputs a target (e.g., a target position) based on a trained function.” teaches that the machine learning model can be a neural network)
The combination of claim 8 has already incorporated the Thompson sampling reinforcement algorithm using a history of states and rewards, therefore already incorporating the details of the machine learning model required by claim 24.

Regarding Claim 25, 
This claim recites The computer-implemented method of claim 14, which performs a plurality of operations as recited by the system of claim 21, and has limitations that are similar to those of claim 21, thus is rejected with the same rationale applied against claim 21.
Regarding Claim 26, 
This claim recites The computer-implemented method of claim 25, which performs a plurality of operations as recited by the system of claim 22, and has limitations that are similar to those of claim 22, thus is rejected with the same rationale applied against claim 22.


Response to Arguments

Regarding 35 U.S.C. 112 Claim Rejections: 
Applicant’s argument: 
“The Office Action rejects claims 8-11 for alleged indefiniteness. Applicant has amended the independent claim, claim 8, to expedite prosecution. 
The dependent claims are allowable at least for depending from an allowable claim, and for further reasons recited therein. As such, Applicant requests withdrawal of these rejections. ”
Response: 
	The previous grounds of 112(b) rejection have been withdrawn due to amendments, however, claim 24 remains rejected under 35 U.S.C. 112(b), as necessitated by amendments. Please see pages 2-3 of this office action for more information. 

Regarding 35 U.S.C. 101 Claim Rejections: 
Applicant’s argument: 
“Here, likewise, at least the claimed features that recite "select one of the plurality of content elements for presentation in the at least one content container, wherein the one of the plurality of content elements is selected by a trained selection model using Thompson sampling, wherein the trained selection model calculates one or more posterior distribution parameters of a total reward value Q, and wherein the Thompson sampling is applied to the one or more posterior distribution parameters calculated using a short-term reward value, r, and a long term reward value, R" require action that cannot be practically applied in the human mind. For example, the use of the "trained selection model using Thompson sampling" recited claim 1 is too complex to be "...performed in the human mind, or by a human with pencil and paper".”
Response: 
	Applicant’s arguments have been fully considered but are not persuasive. Pages 3-4 of this office action have been copied here below: 

Claim 1 recites the following limitations: 
select one of the plurality of content elements for presentation in the at least one content container,
This/These limitation(s) falls within the mental process grouping of abstract ideas that can be performed in the human mind, or by a human with pencil and paper. Claim 1 recites additional limitations: 
wherein the one of the plurality of content elements is selected by… using Thompson sampling,
…calculates one or more posterior distribution parameters of a total reward value Q,
and wherein the Thompson sampling is applied to the one or more posterior distribution parameters calculated using a short-term reward value, r, and a long term reward value, R; 
This/These limitation(s) require using Thompson sampling to select content elements, calculating posterior distribution parameters of a total reward value, and applying Thompson sampling to the posterior distribution parameters calculated using a short term reward value and a long term reward value. These steps fall within the mathematical concept grouping of abstract ideas. Thus, Claim 1 recites an abstract idea.

As the office action states, the only limitation directed toward a mental process is “select one of the plurality of content elements for presentation in the at least one content container”. The other limitations of claim 1 are not directed toward mental processes, they are directed toward mathematical concepts. Selecting content elements for presentation is a mental process that can be performed practically in the mind, or with assistance of pen and paper. 

Applicant’s argument: 
“For example, claim 1 recites, among other things"...wherein the one of the plurality of content elements is selected by a trained selection model using Thompson sampling calculates one or more posterior distribution parameters of a total reward value Q, and wherein the Thompson sampling is applied to the one or more posterior distribution parameters calculated using a short-term reward value, r, and a long term reward value, R"; which, at a minimum, improves the accuracy and effectiveness of the "...memory having instructions stored thereon, and a processor that reads the instructions" to " select one of the plurality of content elements for presentation in the at least one content container", "generate an interface including the selected one of the plurality of content elements", and "provide the interface for display" by reducing randomness and noise. See claim 1 of the instant application. Proof of the improvements can be found throughout the instant application.”

Response: 
	Applicant’s argument has been fully considered but is not persuasive. MPEP 2106.05(a) states the following – “It is important to note, the judicial exception alone cannot provide the improvement. The improvement can be provided by one or more additional elements.” Therefore, the judicial exception cannot provide any alleged improvement that Applicant’s representative mentions. The additional elements of claim 1 are either generic computer components, recitation of insignificant extra-solution activity, or recitation of generally linking the judicial exception to a particular technological environment or field of use. Therefore, there are no additional elements in claim 1 that provide a technological improvement and integrate the abstract idea into a practical application. 

Applicant’s argument: 
“As such, the claim, as a whole, is not simply directed to a judicial exception and includes meaningful limitations. The elements of claim 1 cannot be performed in the human mind and is not "directed" to a mathematical concept, and instead require a computing device, which is "integral to the claim," and thus "indicates that [the] additional element[s] have integrated the [alleged] exception into a practical application."”

Response: 
	Applicant’s arguments have been fully considered but are not persuasive. As mentioned above, the only limitation directed toward a mental process is “select one of the plurality of content elements for presentation in the at least one content container”, which can be practically performed with assistance of pen and paper. The other limitations of claim 1 are directed toward mathematical concepts. Instructions to apply the abstract idea on generic computer components (a memory having instructions stored thereon, and a processor that reads the instructions to:) do not represent a practical application of the abstract idea (see MPEP 2106.05(f)). The other additional elements of claim 1 are directed to recitation of insignificant extra-solution activity and recitation of generally linking the judicial exception to a particular technological environment or field of use, which cannot integrate the abstract idea into a practical application. 

Applicant’s argument: 
“Claim 1 (and similarly claims 8 and 14) add elements that are indicative of an inventive concept. For example, the independent claims recite the use of "short-term reward value, r, and a long term reward value, R". In addition, the dependent claims recite further subject matter that are not well-understood, routine or conventional for at least the reasons recited below in the discussion of the allowability of the claims over the references cited in the obviousness rejections. 
As such, at least for these reasons, the subject matter of claim 1 is patent eligible. Further analysis of limitations containing alleged "insignificant extra-solution activity" is not necessary. Claims 8 and 14, although different in scope, recite similar subject matter, and appear to have been rejected for similar reasons. Therefore, at least for one or more relevant reasons as set forth above, independent claims 8 and 14 are also patent eligible.”

Response: 
	Applicant’s arguments have been fully considered but are not persuasive. Firstly, analysis directed to well-understood, routine, and conventional is in response to recitation of insignificant extra-solution activity, this analysis does not apply to the judicial exception and limitations that were identified as being directed toward mental processes and mathematical concepts. Further, in the 101 analysis, the following limitation- 
and wherein the Thompson sampling is applied to the one or more posterior distribution parameters calculated using a short-term reward value, r, and a long term reward value, R;
was analyzed, under broadest reasonable interpretation, to fall under the mathematical concepts – this limitation is not analyzed as an additional element. 

Regarding 35 USC 103 Claim Rejections: 
Applicant’s argument: 
“This cited portion of Montgomery does not teach, "wherein the trained selection model calculates one or more posterior distribution parameters of a total reward value Q, and wherein the Thompson sampling is applied to the one or more posterior distribution parameters calculated using a short-term reward value, r, and a long term reward value, R" as the Office Action states.
The cited portions of Montgomery, at best, appear to discuss a Thompson sampling reinforcement learning model that includes an agent that takes actions to maximize an expected reward based on a policy, where the policy may be updated based on the history of actions, states, and rewards. Montgomery, at a minimum, is silent on the claim limitation of "long term reward value, R" and does not attempt to discuss that the long term rewards are "...the expected discounted rewards from the user in future sessions (e.g., long-term rewards)." See paragraph [0037] of the instant application.”

Response: 
Applicant’s arguments have been considered but are not persuasive. Firstly, claim limitations are read in light of the specification, however the specification cannot be imported into the claim. Because Para [0037] of the claim does not consist of a special definition, the broadest reasonable interpretation of “long term rewards” is used to analyze the claim limitation. Therefore, applicant’s arguments are not commensurate with the claims of the instant application because "...the expected discounted rewards from the user in future sessions (e.g., long-term rewards)." is not a special definition (in the specification) and is not a limitation in independent claim 1. 
Finally, Applicant’s arguments have misconstrued the cited portion of the office action. The office action states the following: 

However, Montgomery teaches: 
wherein the trained selection model calculates one or more [policy] parameters of a total reward value Q, and wherein the Thompson sampling is applied to the one or more [policy] parameters calculated using a short-term reward value, r, and a long term reward value, R; and (Para [0044-0045]: “In another aspect of the disclosure, the selection algorithm may be a machine learning model, such as an analytical model, a neural network, a reinforcement learning model, or, generally, a model that takes inputs ( e.g., a feature set) and outputs a target (e.g., a target position) based on a trained function. The function may be trained using a training set of labeled data, while deployed in an environment (simulated or real), or while deployed in parallel to a different model to observe how the function would have performed if it was deployed. Specifically, in this aspect of the disclosure, the selection algorithm may be a Thompson sampling reinforcement learning model. The Thompson sampling reinforcement learning model may include an agent that takes one of one or more action(s) (e.g., from an action function) in an environment to maximize an expected reward (based on a reward function) based on the modeled state of the environment (which represents the environment and the agent in that environment, as updated based on the agent's action and other changes in the environment). The agent then may receive the actual reward and the new state in response to the chosen action, and makes another action. Generally, the agent selects actions according to a policy. The policy may be updated according to the history of actions, states, and rewards.” teaches that the trained selection algorithm is a Thompson sampling reinforcement algorithm that can include an agent that selects actions according to a policy in order to maximize an expected reward (total reward value Q). The policy is updated according to a history received rewards, therefore the policy is determined based on a most recent reward (short-term reward value) and any reward received in the past (long term reward value))
Chapelle is relied upon to further teach that policy parameters are posterior distribution parameters. 
Montgomery teaches using a Thompson sampling reinforcement learning algorithm that includes a policy to maximize an expected reward (total reward Q). Montgomery also teaches that the policy is updated according to a history of rewards, therefore the policy is updated based on a most recent reward (short term reward) and any reward received in the past (long term reward). This falls within the broadest reasonable interpretation of “long term reward”, therefore, Montgomery is not silent with regards to this feature of claim 1. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHOUN ABRAHAM whose telephone number is (571)272-8144.  The examiner can normally be reached on Mon - Fri 08:00-16:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/S.J.A./Examiner, Art Unit 2125                                                                                                                                                                                                        
/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125