DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
	This Office Action is in response to the communication filed on 9/12/2019.
	Claims 1-20 are being considered on the merits.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 9/12/2019 has been considered. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, initialed and dated copies of Applicant's IDS forms 1449 filed 9/12/2019 are attached to the instant Office action. 

Drawings
	The drawings filed on 9/12/2019 are accepted. 

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. The following title is suggested: “SYSTEM AND METHOD FOR A RECOMMENDER USING DEEP REINFORCEMENT LEARNING AND Q-LEARNING BASED ON USER ATTRIBUTES”

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 USC § 101
Claim 1 is rejected under 35 USC § 101 
	Step 2a Prong 1: 
Predicting at least one score for at least one item  (Mental process: Predicting a score is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind; nothing in this claim element precludes the step from practically being performed in the mind.)
In at least one iteration of a plurality of iterations (Insignificant extra-solution activities)
receiving a user profile having a plurality of user attribute values (Insignificant extra-solution activities)
computing the at least one score according to a similarity between the user profile and a plurality of other user profiles by inputting the user profile and a plurality of items into a prediction model trained by: (Mental process: Computing a score is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “a prediction model”, nothing in this claim element precludes the step from practically being performed in the mind.)
in each of a plurality of training iterations (Insignificant extra-solution activities)
receiving a training user profile of a plurality of training user profiles, the training user profile having a plurality of training user attribute values (Insignificant extra-solution activities)
computing by the prediction model a plurality of predicted scores, each for one of a plurality of training items, in response to the training user profile and the plurality of training items, where each of the plurality of training items has a plurality of training item properties (Mental process: Computing a plurality of predicted scores a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “the prediction model”, nothing in this claim element precludes the step from practically being performed in the mind.)
computing for the training user profile a plurality of expected scores, each computed for one of the plurality of training items according to the plurality of training user attribute values and the plurality of training item properties of the training item (Mental process: Computing expected scores is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than referencing “a prediction model”, nothing in this claim element precludes the step from practically being performed in the mind.)
modifying at least one model value of a plurality of model values of the prediction model to maximize a reward score computed using the plurality of expected scores and the plurality of predicted scores (mere instructions to apply the exception using generic computer component)
outputting the at least one score. (Insignificant extra-solution activities)
Step 2a Prong 2: This judicial exception is not integrated into practical application
In at least one iteration of a plurality of iterations (Insignificant extra-solution activities)
receiving a user profile having a plurality of user attribute values (Insignificant extra-solution activities)
in each of a plurality of training iterations (Insignificant extra-solution activities)
receiving a training user profile of a plurality of training user profiles, the training user profile having a plurality of training user attribute values (Insignificant extra-solution activities)
modifying at least one model value of a plurality of model values of the prediction model to maximize a reward score computed using the plurality of expected scores and the plurality of predicted scores (mere instructions to apply the exception using generic computer component)
outputting the at least one score. (Insignificant extra-solution activities)
Step 2b: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
In at least one iteration of a plurality of iterations (Performing Repetitive Calculations: MPEP 2106.05(d)(II) provides that performing repetitive calculations is well-understood, routine, and conventional activity)
receiving a user profile having a plurality of user attribute values (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)
in each of a plurality of training iterations (Performing Repetitive Calculations: MPEP 2106.05(d)(II) provides that performing repetitive calculations is well-understood, routine, and conventional activity)
receiving a training user profile of a plurality of training user profiles, the training user profile having a plurality of training user attribute values (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)
modifying at least one model value of a plurality of model values of the prediction model to maximize a reward score computed using the plurality of expected scores and the plurality of predicted scores (mere instructions to apply the exception using generic computer component; making a modification of model values in a generic prediction model does not add significantly more to the to the judicial exception)
outputting the at least one score. (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)

Claim 2 is rejected under 35 USC § 101 
	Step 2a Prong 1: See the rejection of claim 1 above. The same rationale applies to this dependent claim 2. 
The similarity between the user profile and the plurality of other user profiles is computed according to a similarity between the plurality of user attribute values and another plurality of user attribute values of the plurality of other user profiles (Mental process: computing similarities is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind; nothing in this claim element precludes the step from practically being performed in the mind.)
the plurality of user attribute values comprises at least one of: a user demographic value, a user preference value, a user identifier value, and a historical user interaction value (Insignificant extra-solution activities)
the historical user interaction value is indicative of a user interaction selected from a group of user interactions consisting of: a numerical score assigned by a user, a like indication, a purchase, a bookmarked item, and a skipped item. (Insignificant extra-solution activities)
Step 2a Prong 2: This judicial exception is not integrated into practical application 
the plurality of user attribute values comprises at least one of: a user demographic value, a user preference value, a user identifier value, and a historical user interaction value (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)
the historical user interaction value is indicative of a user interaction selected from a group of user interactions consisting of: a numerical score assigned by a user, a like indication, a purchase, a bookmarked item, and a skipped item. (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)
Step 2b: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
the plurality of user attribute values comprises at least one of: a user demographic value, a user preference value, a user identifier value, and a historical user interaction value (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)
the historical user interaction value is indicative of a user interaction selected from a group of user interactions consisting of: a numerical score assigned by a user, a like indication, a purchase, a bookmarked item, and a skipped item. (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)

Claim 3 is rejected under 35 USC § 101 
	Step 2a Prong 1: See the rejection of claim 1 above. The same rationale applies to this dependent claim 3. 
the prediction model comprises at least one deep reinforcement learning (DRL) network. (Mental process: computing scores is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than referencing a “deep reinforcement learning network”, nothing in this claim element precludes the step from practically being performed in the mind.)
Step 2A Prong 2 and Step 2B: The claim does not include additional elements.

Claim 4 is rejected under 35 USC § 101  
	Step 2a Prong 1: See the rejection of claim 1 above. The same rationale applies to this dependent claim 4. 
computing the plurality of expected scores comprises applying a content based filtering method to the plurality of training user attribute values and a plurality of training item properties of the plurality of training items (Insignificant extra-solution activities) 
Step 2A Prong 2: This judicial exception is not integrated into practical application
computing the plurality of expected scores comprises applying a content based filtering method to the plurality of training user attribute values and a plurality of training item properties of the plurality of training items (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity) 

Claim 5 is rejected under 35 USC § 101
	Step 2a Prong 1: See claim 4 above. The same rationale applies to this dependent claim 5. 
applying the content based filtering method comprises providing the plurality of training user attribute values and the plurality of training item properties to at least one neural network (Insignificant extra-solution activities)   
Step 2A Prong 2: This judicial exception is not integrated into practical application
applying the content based filtering method comprises providing the plurality of training user attribute values and the plurality of training item properties to at least one neural network (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)   

Claim 6 is rejected under 35 USC § 101 
	Step 2a Prong 1: See the rejection of claim 1 above. The same rationale applies to this dependent claim 6. 
training the prediction model comprises using a Q-learning method having a state, a plurality of actions, a reward and an output (Insignificant extra-solution activities)
the state is a vector of state values indicative of a plurality of training user attribute values of the training user profile (Insignificant extra-solution activities) 
the plurality of actions is a plurality of vectors of item values, each vector of item values indicative of a respective plurality of training item properties of one of the plurality of training items (Insignificant extra-solution activities) 
the reward is the plurality of expected scores (Insignificant extra-solution activities) 
the output is the plurality of predicted scores (Insignificant extra-solution activities) 
Step 2A Prong 2: This judicial exception is not integrated into practical application
training the prediction model comprises using a Q-learning method having a state, a plurality of actions, a reward and an output (Insignificant extra-solution activities)
the state is a vector of state values indicative of a plurality of training user attribute values of the training user profile (Insignificant extra-solution activities) 
the plurality of actions is a plurality of vectors of item values, each vector of item values indicative of a respective plurality of training item properties of one of the plurality of training items (Insignificant extra-solution activities) 
the reward is the plurality of expected scores (Insignificant extra-solution activities) 
the output is the plurality of predicted scores (Insignificant extra-solution activities)
Step 2b: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
training the prediction model comprises using a Q-learning method having a state, a plurality of actions, a reward and an output (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)
the state is a vector of state values indicative of a plurality of training user attribute values of the training user profile (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)
the plurality of actions is a plurality of vectors of item values, each vector of item values indicative of a respective plurality of training item properties of one of the plurality of training items (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)
the reward is the plurality of expected scores (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)
the output is the plurality of predicted scores (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)

Claim 7 is rejected under 35 USC § 101 
	Step 2a Prong 1: See the rejection of claim 1 above. The same rationale applies to this dependent claim 7. 
training the prediction model comprises using a Q-learning method having another state, another plurality of actions, another reward and another output (Insignificant extra-solution activities)
the other state is a vector of state values indicative of another plurality of training user attribute values of the training user profile and another plurality of training item properties of the plurality of training items (Insignificant extra-solution activities)
the plurality of actions is another plurality of vectors of item values, each vector of item values indicative of a respective plurality of training item properties of one of the plurality of training items (Insignificant extra-solution activities)
the reward is one of the plurality of expected scores (Insignificant extra-solution activities)
the output is a predicted score computed for one of the plurality of training user profiles and one of the plurality of training items in at least one of the plurality of training iterations. (Insignificant extra-solution activities)
Step 2a Prong 2: This judicial exception is not integrated into practical application 
training the prediction model comprises using a Q-learning method having another state, another plurality of actions, another reward and another output (Insignificant extra-solution activities)
the other state is a vector of state values indicative of another plurality of training user attribute values of the training user profile and another plurality of training item properties of the plurality of training items (Insignificant extra-solution activities)
the plurality of actions is another plurality of vectors of item values, each vector of item values indicative of a respective plurality of training item properties of one of the plurality of training items (Insignificant extra-solution activities)
the reward is one of the plurality of expected scores (Insignificant extra-solution activities)
the output is a predicted score computed for one of the plurality of training user profiles and one of the plurality of training items in at least one of the plurality of training iterations. (Insignificant extra-solution activities)

Step 2b: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
training the prediction model comprises using a Q-learning method having another state, another plurality of actions, another reward and another output (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)
the other state is a vector of state values indicative of another plurality of training user attribute values of the training user profile and another plurality of training item properties of the plurality of training items (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity) 
the plurality of actions is another plurality of vectors of item values, each vector of item values indicative of a respective plurality of training item properties of one of the plurality of training items (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity) 
the reward is one of the plurality of expected scores (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity) 
the output is a predicted score computed for one of the plurality of training user profiles and one of the plurality of training items in at least one of the plurality of training iterations. (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)

Claim 8 is rejected under 35 USC § 101 
	Step 2a Prong 1: See the rejection of claim 1 above. The same rationale applies to this dependent claim 8. 
collecting at least one feedback value from at least one training user associated with at least one of the plurality of training user profiles, where the at least one feedback value is indicative of a level of agreement of the at least one user with at least some of the plurality of predicted scores computed by the prediction model in response to the respective training user profile and the plurality of training items (Insignificant extra-solution activities)
updating at least one training user attribute value of the respective at least one training user profile according to the at least one feedback value (Insignificant extra-solution activities)
Step 2a Prong 2: This judicial exception is not integrated into practical application
collecting at least one feedback value from at least one training user associated with at least one of the plurality of training user profiles, where the at least one feedback value is indicative of a level of agreement of the at least one user with at least some of the plurality of predicted scores computed by the prediction model in response to the respective training user profile and the plurality of training items (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)
updating at least one training user attribute value of the respective at least one training user profile according to the at least one feedback value (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)
Step 2b: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
collecting at least one feedback value from at least one training user associated with at least one of the plurality of training user profiles, where the at least one feedback value is indicative of a level of agreement of the at least one user with at least some of the plurality of predicted scores computed by the prediction model in response to the respective training user profile and the plurality of training items (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)
updating at least one training user attribute value of the respective at least one training user profile according to the at least one feedback value (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)

Claim 9 is rejected under 35 USC § 101 
	Step 2a Prong 1: See the rejection of claim 1 above. The same rationale applies to this dependent claim 9. 
at least one of the plurality of items is selected from a group of items consisting of: a restaurant identifier, a hospitality facility identifier, a movie identifier, a book identifier, a consumer appliance identifier, a retailer identifier, and a venue identifier. (Mental process: Computing a score is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than referencing a “prediction model” in the independent claim, nothing in this claim element precludes the step from practically being performed in the mind.)
Step 2A Prong 2 and Step 2B: The claim does not include additional elements.

Claim 10 is rejected under 35 USC § 101 
	Step 2a Prong 1: See the rejection of claim 1 above. The same rationale applies to this dependent claim 10. 
The method of claim 1, wherein outputting the at least one score comprises outputting for each of the at least one score a respective item of the at least one item (Insignificant extra-solution activities) 
Step 2a Prong 2: This judicial exception is not integrated into practical application 
The method of claim 1, wherein outputting the at least one score comprises outputting for each of the at least one score a respective item of the at least one item (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)
Step 2b: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
The method of claim 1, wherein outputting the at least one score comprises outputting for each of the at least one score a respective item of the at least one item (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)
Claim 11 is rejected under 35 USC § 101 
	Step 2a Prong 1: See the rejection of claim 1 above. The same rationale applies to this dependent claim 11. 
inputting the user profile and the plurality of items into a prediction model comprises computing at least one set of state values indicative of the plurality of user attribute values and a plurality of item properties of the plurality of items. (Insignificant extra-solution activities)
Step 2a Prong 2: This judicial exception is not integrated into practical application 
inputting the user profile and the plurality of items into a prediction model comprises computing at least one set of state values indicative of the plurality of user attribute values and a plurality of item properties of the plurality of items. (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)
Step 2b: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
inputting the user profile and the plurality of items into a prediction model comprises computing at least one set of state values indicative of the plurality of user attribute values and a plurality of item properties of the plurality of items. (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)

Claim12 is rejected under 35 USC § 101 
	Step 2a Prong 1: See the rejection of claim 1 above. The same rationale applies to this dependent claim 12. 
computing the at least one score further comprises: computing at least one other score, each computed for one of the plurality of items according to the plurality of user attribute values and the respective plurality of item properties of the respective item (Mental process: Computing a score is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than referencing a “prediction model” in the independent claim, nothing in this claim element precludes the step from practically being performed in the mind.)
aggregating the at least one score with the at least one other score (Mathematical Relationship: MPEP 2106.04(a)(2) provides that, “organizing information and manipulating information through mathematical correlation,” is an abstract idea)
Step 2A Prong 2 and Step 2B: The claim does not include additional elements.

Claim 13 is rejected under 35 USC § 101 
	Step 2a Prong 1: See the rejection of claim 12 above. The same rationale applies to this dependent claim 13. 
computing the at least one other score comprises applying a content based filtering method to the plurality of user attribute values and a plurality of item properties of the plurality of items  (Mental process: computing scores by filtering out content a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind; nothing in this claim element precludes the step from practically being performed in the mind.)
Step 2A Prong 2 and Step 2B: The claim does not include additional elements.

Claim 14 is rejected under 35 USC § 101 
	Step 2a Prong 1: See the rejection of claim 1 above. The same rationale applies to this dependent claim 14. 
computing the at least one score further comprises: identifying at least one highest score of the at least one score (Mental process: computing scores by identifying a high score is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind; nothing in this claim element precludes the step from practically being performed in the mind.)
outputting the at least one highest score (Insignificant extra-solution activities) 
Step 2a Prong 2: This judicial exception is not integrated into practical application 
outputting the at least one highest score (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)
Step 2b: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
outputting the at least one highest score (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)

Claim 15 is rejected under 35 USC § 101 
	Step 2a Prong 1: See the rejection of claim 1 above. The same rationale applies to this dependent claim 15. 
computing the at least one score further comprises: computing at least one filtered score by applying at least one test to the at least one score (Mental process: Computing a score by applying a test is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind; nothing in this claim element precludes the step from practically being performed in the mind.)
outputting the at least one filtered score. (Insignificant extra-solution activities)
Step 2a Prong 2: This judicial exception is not integrated into practical application 
outputting the at least one filtered score. (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)
Step 2b: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
outputting the at least one filtered score. (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)

Claim 16 is rejected under 35 USC § 101 
	Step 2a Prong 1: See the rejection of claim 2 above. The same rationale applies to this dependent claim 16. 
computing the at least one score further comprises: computing at least one collaborative filtering score, each computed for one of the plurality of items according to another similarity between the plurality of user attribute values and the other plurality of user attribute values of the plurality of other user profiles, by applying at least one matrix factorization method to the plurality of item properties, the plurality of user attribute values and the other plurality of user attribute values (Mental process: computing a score is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind; nothing in this claim element precludes the step from practically being performed in the mind.)
aggregating the at least one score with the at least one collaborative filtering score. (Mathematical Relationship: MPEP 2106.04(a)(2) provides that, “organizing information and manipulating information through mathematical correlation,” is an abstract idea)
Step 2A Prong 2 and Step 2B: The claim does not include additional elements.

Claim 17 is rejected under 35 USC § 101 
	Step 2a Prong 1: 
predicting at least one score for at least one item, comprising at least one hardware processor adapted to  (Mental process: Predicting a score is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind; nothing in this claim element precludes the step from practically being performed in the mind.)
in at least one iteration of a plurality of iterations (Insignificant extra-solution activities)
receiving a user profile having a plurality of user attribute values (Insignificant extra-solution activities)
computing the at least one score according to a similarity between the user profile and a plurality of other user profiles by inputting the user profile and a plurality of items into a prediction model trained by (Mental process: Computing a score is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than referencing a “prediction model” in the independent claim, nothing in this claim element precludes the step from practically being performed in the mind.)
in each of a plurality of training iterations (Insignificant extra-solution activities)
receiving a training user profile of a plurality of training user profiles, the training user profile having a plurality of training user attribute values (Insignificant extra-solution activities)
computing by the prediction model a plurality of predicted scores, each for one of a plurality of training items, in response to the training user profile and the plurality of training items, where each of the plurality of training items has a plurality of training item properties (Mental process: Computing a score is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than referencing a “prediction model” in the independent claim, nothing in this claim element precludes the step from practically being performed in the mind.)
computing for the training user profile a plurality of expected scores, each computed for one of the plurality of training items according to the plurality of training user attribute values and the plurality of training item properties of the training item (Mental process: Computing a score is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind; nothing in this claim element precludes the step from practically being performed in the mind.)
modifying at least one model value of a plurality of model values of the prediction model to maximize a reward score computed using the plurality of expected scores and the plurality of predicted scores (mere instructions to apply the exception using generic computer component)
outputting the at least one score. (Insignificant extra-solution activities)
Step 2a Prong 2: This judicial exception is not integrated into practical application 
in at least one iteration of a plurality of iterations (Insignificant extra-solution activities)
receiving a user profile having a plurality of user attribute values (Insignificant extra-solution activities)
in each of a plurality of training iterations (Insignificant extra-solution activities)
receiving a training user profile of a plurality of training user profiles, the training user profile having a plurality of training user attribute values (Insignificant extra-solution activities)
modifying at least one model value of a plurality of model values of the prediction model to maximize a reward score computed using the plurality of expected scores and the plurality of predicted scores (mere instructions to apply the exception using generic computer component)
outputting the at least one score. (Insignificant extra-solution activities)
Step 2b: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
in at least one iteration of a plurality of iterations (Performing Repetitive Calculations: MPEP 2106.05(d)(II) provides that performing repetitive calculations is well-understood, routine, and conventional activity)
receiving a user profile having a plurality of user attribute values (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)
in each of a plurality of training iterations (Performing Repetitive Calculations: MPEP 2106.05(d)(II) provides that performing repetitive calculations is well-understood, routine, and conventional activity) 
receiving a training user profile of a plurality of training user profiles, the training user profile having a plurality of training user attribute values (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)
modifying at least one model value of a plurality of model values of the prediction model to maximize a reward score computed using the plurality of expected scores and the plurality of predicted scores (mere instructions to apply the exception using generic computer component)
outputting the at least one score. (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)

Claim 18 is rejected under 35 USC § 101 
	Step 2a Prong 1: See the rejection of claim 17 above. The same rationale applies to this dependent claim 18. 
the at least one hardware processor is adapted to outputting the at least one score via at least one digital communication network interface connected to the at least one hardware processor. (Insignificant extra-solution activities)
Step 2a Prong 2: This judicial exception is not integrated into practical application 
the at least one hardware processor is adapted to outputting the at least one score via at least one digital communication network interface connected to the at least one hardware processor. (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity) 
Step 2b: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
the at least one hardware processor is adapted to outputting the at least one score via at least one digital communication network interface connected to the at least one hardware processor. (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity) 


Claim 19 is rejected under 35 USC § 101 
	Step 2a Prong 1: See the rejection of claim 17 above. The same rationale applies to this dependent claim 19. 
wherein the at least one hardware processor is adapted to receiving the user profile by at least one of: receiving the user profile via at least one digital communication network interface connected to the at least one hardware processor, and retrieving the user profile from at least one non-volatile digital storage connected to the at least one hardware processor. (Insignificant extra-solution activities)
Step 2a Prong 2: This judicial exception is not integrated into practical application 
wherein the at least one hardware processor is adapted to receiving the user profile by at least one of: receiving the user profile via at least one digital communication network interface connected to the at least one hardware processor, and retrieving the user profile from at least one non-volatile digital storage connected to the at least one hardware processor. (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)
Step 2b: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
wherein the at least one hardware processor is adapted to receiving the user profile by at least one of: receiving the user profile via at least one digital communication network interface connected to the at least one hardware processor, and retrieving the user profile from at least one non-volatile digital storage connected to the at least one hardware processor. (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity)

Claim 20 is rejected under 35 USC § 101 
	Step 2a Prong 1: 
comprising at least one hardware processor adapted to (mere instructions to apply the exception using generic computer component)
in each of a plurality of training iterations (Insignificant extra-solution activities)
receiving a training user profile of a plurality of training user profiles, the training user profile having a plurality of training user attribute values (Insignificant extra-solution activities)
this claim element precludes the step from practically being performed in the mind.)
modifying at least one model value of a plurality of model values of the prediction model to maximize a reward score computed using the plurality of expected scores and the plurality of predicted scores (mere instructions to apply the exception using generic computer component)
Step 2a Prong 2: This judicial exception is not integrated into practical application 
comprising at least one hardware processor adapted to (mere instructions to apply the exception using generic computer component)
in each of a plurality of training iterations (Performing Repetitive Calculations: MPEP 2106.05(d)(II) provides that performing repetitive calculations is well-understood, routine, and conventional activity)
receiving a training user profile of a plurality of training user profiles, the training user profile having a plurality of training user attribute values (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity) 
modifying at least one model value of a plurality of model values of the prediction model to maximize a reward score computed using the plurality of expected scores and the plurality of predicted scores (mere instructions to apply the exception using generic computer component)
Step 2b: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
comprising at least one hardware processor adapted to (mere instructions to apply the exception using generic computer component; using a generic “hardware processor” does not add significantly more to the to the judicial exception)
in each of a plurality of training iterations (Performing Repetitive Calculations: MPEP 2106.05(d)(II) provides that performing repetitive calculations is well-understood, routine, and conventional activity)
receiving a training user profile of a plurality of training user profiles, the training user profile having a plurality of training user attribute values (Mere Data Gathering: MPEP 2106.05(d)(II) provides that, “receiving or transmitting data over a network, e.g., using the Internet to gather data” is well-understood, routine, and conventional activity) 
modifying at least one model value of a plurality of model values of the prediction model to maximize a reward score computed using the plurality of expected scores and the plurality of predicted scores (mere instructions to apply the exception using generic computer component; making a modification of model values in a generic prediction model does not add significantly more to the to the judicial exception)


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1-20, are rejected under 35 U.S.C. 103 as being unpatentable over Zhao, et. al. (“Deep Reinforcement Learning for List-wise Recommendations”, 27 Jun. 2019, arXiv; hereinafter “Zhao”) in view of Wilson, et. al. (US 2015/0220835 A1; hereinafter “Wilson”)
Regarding Claim 1, Zhao teaches a method for: 
In at least one iteration of a plurality of iterations (Zhao, sec. 2.5: “The training algorithm for the proposed framework DEV is presented in Algorithm 3. In each iteration, there are two stages, i.e., 1) transition generating stage (lines 8-20), and 2) parameter updating stage (lines 21-28)”)
in each of a plurality of training iterations (Zhao, sec. 2.5: “The training algorithm for the proposed framework DEV is presented in Algorithm 3. In each iteration, there are two stages, i.e., 1) transition generating stage (lines 8-20), and 2) parameter updating stage (lines 21-28)”)
receiving a training user profile of a plurality of training user profiles, the training user profile having a plurality of training user attribute values (Zhao, sec. 1.3: “which enables the framework to train the parameters offline based on the simulated reward. More specifically, we build the simulator by users’ historical records. The intuition is no matter what algorithms a recommender system adopt, given the same state (or a user’s historical records) and the same action (recommending the same items to the user), the user will make the same feedbacks to the items.” Examiner notes that the broadest reasonable interpretation of “training user attribute values” includes any attributes relating to a user including a user’s historical records.)
computing by the prediction model a plurality of predicted scores, each for one of a plurality of training items, in response to the training user profile and the plurality of training items, where each of the plurality of training items has a plurality of training item properties (Zhao, sec. 1.3: “This may result in inconsistent results between offline and online measurements. Our proposed online environment simulator can also mitigate this challenge by producing simulated online rewards given any state-action pair, so that the recommender system can rate items from the whole item space.” Examiner notes that the broadest reasonable interpretation of “scores” means any accounting including quantifying and rating items)
computing for the training user profile a plurality of expected scores, each computed for one of the plurality of training items according to the plurality of training user attribute values and the plurality of training item properties of the training item (Zhao, sec. 1.3: “Thus we cannot get the feedbacks (rewards) of items that are not in users’ historical records. This may result in inconsistent results between offline and online measurements. Our proposed online environment simulator can also mitigate this challenge by producing simulated online rewards given any state-action pair, so that the recommender system can rate items from the whole item space.” Examiner notes that the broadest reasonable interpretation of “scores” means any accounting including quantifying and rating items)
modifying at least one model value of a plurality of model values of the prediction model to maximize a reward score computed using the plurality of expected scores and the plurality of predicted scores (Zhao, sec. 2.1: “Figure 2 illustrates the agent-user interactions in MDP. By interacting with the environment (users), recommender agent takes actions (recommends items) to users in such a way that maximizes the expected return, which includes the delayed rewards. We follow the standard assumption that delayed rewards are discounted by a factor of γ per time-step”. Examiner notes that Zhao provides a discount factor in its model which changes per time-step i.e. modified).
outputting the at least one score (Zhao, sec. 2.2: “In practice, the reward is usually a number, rather than a vector. Thus if the                                 
                                    
                                        
                                            p
                                        
                                        
                                            t
                                        
                                    
                                     
                                
                            is mapped to                                 
                                    
                                        
                                            U
                                        
                                        
                                            x
                                        
                                    
                                
                            ,we calculate the overall reward                                 
                                    
                                        
                                            r
                                        
                                        
                                            t
                                        
                                    
                                
                             of the whole recommended list”. Examiner notes that the broadest reasonable interpretation of “outputting” means to produce, deliver, or supply and “score” to mean an accounting such that Zhao teaches calculating (i.e. producing) an overall reward value (i.e. score))
Zhao does not explicitly disclose:
Predicting at least one score for at least one item 
receiving a user profile having a plurality of user attribute values 
computing the at least one score according to a similarity between the user profile and a plurality of other user profiles by inputting the user profile and a plurality of items into a prediction model trained by
However, Wilson teaches:
Predicting at least one score for at least one item (Wilson, pg. 3, para. 0063: “User attribute weights determined from user affinity data are then applied to the final filter set to determine an overall score for each item of interest”.)
receiving a user profile having a plurality of user attribute values (Wilson, pg. 8, para. 0106: “The system 100 may access a user profile to collect data from the user profile such as other venues liked, gender, profession, or age.”).
computing the at least one score according to a similarity between the user profile and a plurality of other user profiles by inputting the user profile and a plurality of items into a prediction model trained by: (Wilson, pg. 6, para. 0087: “Nodes in the data network represent venues, venue properties, users, user properties, reviewers, reviewer properties, and the like. Links or links represent relations between those nodes. The number of links between two items might therefore grow as data on two items grows. The strength of each link denotes the affinity between the two connected items, such as similarity of star rating (in a review of a venue), number of attributes held in common. Links can be either positive or negative in sign.” Examiner notes that the broadest reasonable interpretation of “computing [a] score” means to calculate and keep a running account such as calculating the strength of a link between two users).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Wilson into Zhao. Zhao teaches a recommender system using deep reinforcement learning, integrating concepts of q-learning; Wilson teaches a recommender system as network of interrelationships between users and venues (items). One of ordinary skill would have been motivated to combine the teachings of Wilson into Zhao in order to execute more holistic searches and to generate more timely and accurate recommendations (Wilson, para. 008).

Regarding Claim 2, Zhao and Wilson teaches the method of claim 1 (above). Wilson further teaches: 
The similarity between the user profile and the plurality of other user profiles is computed according to a similarity between the plurality of user attribute values and another plurality of user attribute values of the plurality of other user profiles (Wilson, pg. 2, para. 0057: “Each link may incrementally strengthen or weaken the overall interrelationship between two venues, a venue and a reviewer, or two reviewers.”)
the plurality of user attribute values comprises at least one of: a user demographic value, a user preference value, a user identifier value, and a historical user interaction value (Wilson, pg. 2, para. 0054: “The system, in one implementation, also gathers data concerning the attributes of user, such as gender, age, profession, marital status, and affinity (whether positive or negative) for certain venues.”)
the historical user interaction value is indicative of a user interaction selected from a group of user interactions consisting of: a numerical score assigned by a user, a like indication, a purchase, a bookmarked item, and a skipped item (Wilson, pg. 4, para. 0073: “A user's or reviewer's affinity (again, positive or negative) for a venue is derived from both evaluations and assessments of venues, such as reviews or ratings…Ratings may also be published by votes placed via “Like” or “Ding” ‘buttons disposed on various websites…An individual's affinity for certain venues can also be discerned from their spending habits or purchase history…An individual's website navigation bookmarks and browsing history also reflect browsing behavior and may likewise be mined for source data”; examiner notes that a browsing history may reveal items browsed and unpurchased i.e. skipped items)
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Wilson into Zhao as set forth above with respect 
to claim 1. 


Regarding Claim 3, Zhao and Wilson teaches the method of claim 1 (above). Zhao further teaches: 
the prediction model comprises at least one deep reinforcement learning (DRL) network (Zhao, sec. 1: “Thus, we leverage Deep Reinforcement Learning[10] with (adapted) artificial neural networks as the non-linear approximators to estimate the action-value function in RL”)

Regarding Claim 4, Zhao and Wilson teaches the method of claim 1 (above). Zhao further teaches: 
computing the plurality of expected scores comprises applying a content-based filtering method to the plurality of training user attribute values and a plurality of training item properties of the plurality of training items (Zhao sec. 4: “Another common approach is content based filtering[15], which tries to recommend items with similar properties to those that a user ordered in the past.” Examiner notes that the broadest reasonable interpretation of a user attribute value includes any values of any attributes relating to the user such as items that a user has ordered in the past)
Regarding Claim 5, Zhao and Wilson teaches the method of claim 4 (above). Zhao further teaches: 
applying the content based filtering method comprises providing the plurality of training user attribute values and the plurality of training item properties to at least one neural network. (Zhao, sec. 1 and sec. 4: “Thus, we leverage Deep Reinforcement Learning[10] with (adapted) artificial neural networks as the non-linear approximators to estimate the action-value function in RL” “Another common approach is content based filtering[15], which tries to recommend items with similar properties to those that a user ordered in the past”)

Regarding Claim 6, Zhao and Wilson teaches the method of claim 1 (above). Zhao further teaches: 
training the prediction model comprises using a Q-learning method having a state, a plurality of actions, a reward and an output (Zhao, sec. 1.1 and 1.2: “DQN[14] can calculate Q-values of all recalled items separately and recommend a list of items with highest Q-values. However, these approaches recommend items based on one same state, and ignore relationship among the recommended items. As a consequence, the recommended items are similar. In practice, a bundling with complementary items may receive higher rewards than recommending all similar items.” “Traditional deep Q-learning adopts the first architecture as shown in Fig.1(a), which inputs only the state space and outputs Q-values of all actions.” Examiner notes that the broadest reasonable interpretation of an “output” includes the result of any computed input such as a calculated q-value)
the state is a vector of state values indicative of a plurality of training user attribute values of the training user profile  (Zhao sec. 2.2: “                                
                                    
                                        
                                            N
                                        
                                        
                                            x
                                        
                                    
                                
                             is the size of users’ historical browsing history group that                                 
                                    r
                                    =
                                    
                                        
                                            U
                                        
                                        
                                            x
                                        
                                    
                                    ∙
                                
                                                             
                                    
                                        
                                            
                                                
                                                    s
                                                
                                                
                                                    -
                                                
                                            
                                        
                                        
                                            x
                                        
                                    
                                
                            and                                 
                                    
                                        
                                            
                                                
                                                    a
                                                
                                                
                                                    -
                                                
                                            
                                        
                                        
                                            x
                                        
                                    
                                
                             are the average state vector and average action vector for                                  
                                    r
                                    =
                                    U
                                
                            ”. Examiner notes that that the broadest reasonable interpretation of “indicative” means any indication at all of training user attributes, including users’ historical browsing)
the plurality of actions is a plurality of vectors of item values, each vector of item values indicative of a respective plurality of training item properties of one of the plurality of training items (Zhao sec. 2.2: “                                
                                    
                                        
                                            N
                                        
                                        
                                            x
                                        
                                    
                                
                             is the size of users’ historical browsing history group that                                 
                                    r
                                    =
                                    
                                        
                                            U
                                        
                                        
                                            x
                                        
                                    
                                    ∙
                                
                                                             
                                    
                                        
                                            
                                                
                                                    s
                                                
                                                
                                                    -
                                                
                                            
                                        
                                        
                                            x
                                        
                                    
                                
                            and                                 
                                    
                                        
                                            
                                                
                                                    a
                                                
                                                
                                                    -
                                                
                                            
                                        
                                        
                                            x
                                        
                                    
                                
                             are the average state vector and average action vector for                                  
                                    r
                                    =
                                    U
                                
                            ”. Examiner notes that that the broadest reasonable interpretation of “indicative” means any indication at all of item values, including an action vector based on item values)
the reward is the plurality of expected scores (Zhao, sec. 2.2: “In practice, the reward is usually a number, rather than a vector. Thus if the                                 
                                    
                                        
                                            p
                                        
                                        
                                            t
                                        
                                    
                                
                              is mapped to                                 
                                    
                                        
                                            U
                                        
                                        
                                            x
                                        
                                    
                                
                             we calculate the overall reward                                 
                                    
                                        
                                            r
                                        
                                        
                                            t
                                        
                                    
                                     
                                
                            of the whole recommended list.” Examiner notes that the broadest reasonable interpretation of “scores” means an accounting such as here where the reward is calculated using the plurality of accountings of each item on the list).
the output is the plurality of predicted scores (Zhao, sec. 2.2: “In practice, the reward is usually a number, rather than a vector. Thus if the                                 
                                    
                                        
                                            p
                                        
                                        
                                            t
                                        
                                    
                                
                              is mapped to                                 
                                    
                                        
                                            U
                                        
                                        
                                            x
                                        
                                    
                                
                             we calculate the overall reward                                 
                                    
                                        
                                            r
                                        
                                        
                                            t
                                        
                                    
                                     
                                
                            of the whole recommended list…The intuition of Eq.(4) is that reward in the top of recommended list has a higher contribution to the overall rewards, which force RA arranging items that user may order in the top of the recommended list.” Examiner notes that the broadest reasonable interpretation of “scores” means an accounting and the broadest reasonable interpretation of “predicted” means expected such as here where the output is a reward which is calculated using the plurality of accountings of each item on the list).
Regarding Claim 7, Zhao and Wilson teaches the method of claim 1 (above). Zhao further teaches: 
training the prediction model comprises using a Q-learning method having another state, another plurality of actions, another reward and another output (Zhao, sec. 2.2: “then we can map the current state-action pair                                 
                                    
                                        
                                            p
                                        
                                        
                                            t
                                        
                                    
                                
                             to a reward according the above probability…we assume that                                 
                                    
                                        
                                            r
                                        
                                        
                                            i
                                        
                                    
                                
                             is a reward list containing user’s feedbacks of the recommended items”. Examiner notes that Zhao’s “current-action pair” implies non-current action pairs i.e. other states and actions which are impliedly mapped to other rewards and probability outputs). 
the other state is a vector of state values indicative of another plurality of training user attribute values of the training user profile and another plurality of training item properties of the plurality of training items (Zhao, sec. 2.2: “                                
                                    
                                        
                                            N
                                        
                                        
                                            x
                                        
                                    
                                
                             is the size of users’ historical browsing history group that                                 
                                    r
                                    =
                                    
                                        
                                            U
                                        
                                        
                                            x
                                        
                                    
                                    ∙
                                
                                                             
                                    
                                        
                                            
                                                
                                                    s
                                                
                                                
                                                    -
                                                
                                            
                                        
                                        
                                            x
                                        
                                    
                                
                            and                                 
                                    
                                        
                                            
                                                
                                                    a
                                                
                                                
                                                    -
                                                
                                            
                                        
                                        
                                            x
                                        
                                    
                                
                             are the average state vector and average action vector for                                  
                                    r
                                    =
                                    U
                                
                            ”. Examiner notes that the broadest reasonable interpretation of “indicative” means any indication at all, including a group of users’ historical browsing where an average state and action vectors gives some indication of associated attributes and properties within the users’ historical browsing history group).
the plurality of actions is another plurality of vectors of item values, each vector of item values indicative of a respective plurality of training item properties of one of the plurality of training items (Zhao, sec. 2.2: “                                
                                    
                                        
                                            N
                                        
                                        
                                            x
                                        
                                    
                                
                             is the size of users’ historical browsing history group that                                 
                                    r
                                    =
                                    
                                        
                                            U
                                        
                                        
                                            x
                                        
                                    
                                    ∙
                                
                                                             
                                    
                                        
                                            
                                                
                                                    s
                                                
                                                
                                                    -
                                                
                                            
                                        
                                        
                                            x
                                        
                                    
                                
                            and                                 
                                    
                                        
                                            
                                                
                                                    a
                                                
                                                
                                                    -
                                                
                                            
                                        
                                        
                                            x
                                        
                                    
                                
                             are the average state vector and average action vector for                                  
                                    r
                                    =
                                    U
                                
                            ”. Examiner notes that the broadest reasonable interpretation of “indicative” means any indication at all, including a group of users’ historical browsing where an average state and action vectors gives some indication of associated attributes and properties within the users’ historical browsing history group).
the reward is one of the plurality of expected scores (Zhao sec. 2.2: “Thus if the                                 
                                    
                                        
                                            p
                                        
                                        
                                            t
                                        
                                    
                                
                              is mapped to                                 
                                    
                                        
                                            U
                                        
                                        
                                            x
                                        
                                    
                                
                             we calculate the overall reward                                 
                                    
                                        
                                            r
                                        
                                        
                                            t
                                        
                                    
                                     
                                
                            of the whole recommended list.” Examiner notes that the broadest reasonable interpretation of “scores” means an accounting such as here where the reward is calculated using the plurality of accountings of each item on the list).
the output is a predicted score computed for one of the plurality of training user profiles and one of the plurality of training items in at least one of the plurality of training iterations. (Zhao sec. 2.2 and 2.5: “Thus if the                                 
                                    
                                        
                                            p
                                        
                                        
                                            t
                                        
                                    
                                
                              is mapped to                                 
                                    
                                        
                                            U
                                        
                                        
                                            x
                                        
                                    
                                
                             we calculate the overall reward                                 
                                    
                                        
                                            r
                                        
                                        
                                            t
                                        
                                    
                                     
                                
                            of the whole recommended list…The intuition of Eq.(4) is that reward in the top of recommended list has a higher contribution to the overall rewards, which force RA arranging items that user may order in the top of the recommended list.” ““The training algorithm for the proposed framework DEV is presented in Algorithm 3. In each iteration, there are two stages, i.e., 1) transition generating stage (lines 8-20), and 2) parameter updating stage (lines 21-28)” Examiner notes that the broadest reasonable interpretation of “scores” means an accounting and the broadest reasonable interpretation of “predicted” means expected such as here where the output is a reward which is calculated using the plurality of accountings of each item on the list).

Regarding Claim 8, Zhao and Wilson teaches the method of claim 1 (above). Wilson further teaches: 
collecting at least one feedback value from at least one training user associated with at least one of the plurality of training user profiles, where the at least one feedback value is indicative of a level of agreement of the at least one user with at least some of the plurality of predicted scores computed by the prediction model in response to the respective training user profile and the plurality of training items (Wilson, pg. 7, para. 0098: “The data network may be refined based on an active feedback loop from concerning the effectiveness of the recommendations provided by the system 100. Links can be refined (in either direction) based on feedback for how effective the recommendation was. One measure of the effectiveness of the recommendation is whether funds were spent by the user based on the recommendation, which in turn might be measured via data provided by partners such as financial transaction card issuers. Another measure may be feedback provided by the user in response to a query or survey concerning the recommendation or venue in question.”)
updating at least one training user attribute value of the respective at least one training user profile according to the at least one feedback value (Wilson, pg. 7, para. 0099: “It should be noted that not only first order connections are updated based on feedback. Rather, in various implementations second and higher order connections are optionally updated based on feedback. For instance, when a reviewer's ranking or grade is updated the second order connection between two restaurants which are both liked by the reviewer is updated or correspondingly modified as well.” Examiner notes that the broadest reasonable interpretation of user attribute value includes any value relating to user attributes such as the value of connections)
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Wilson into Zhao as set forth above with respect 
to claim 1. 

Regarding Claim 9, Zhao and Wilson teaches the method of claim 1 (above). Wilson further teaches: 
at least one of the plurality of items is selected from a group of items consisting of: a restaurant identifier, a hospitality facility identifier, a movie identifier, a book identifier, a consumer appliance identifier, a retailer identifier, and a venue identifier (Wilson, pg. 39, para. 0334: “Turning next to matching objects to content pages, whenever the system is gathering data from target websites on an object of interest, the system should ensure that the data on the target site is actually referring to the object of interest. This is especially true when attempting to cross-reference objects across different sites. The system optionally utilizes a “likelihood of match” score to make this determination, taking into account multiple variables. For example, if the system is trying to match a venue on two different sites, the fact that they have the same phone number or address may tend to indicate that they are the same venue. Numeric identifiers on consistent scales are particularly valuable for this purpose, such as phone numbers, UPC symbols, and latitude/longitude.”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Wilson into Zhao as set forth above with respect 
to claim 1. 

Regarding Claim 10 Zhao and Wilson teaches the method of claim 1 (above). Zhao further teaches: 
The method of claim 1, wherein outputting the at least one score comprises outputting for each of the at least one score a respective item of the at least one item (Zhao, sec. 2.3: “Then after computing scores of all items, the RA selects an item with highest score as the sub-action                                 
                                    
                                        
                                            a
                                        
                                        
                                            k
                                        
                                        
                                            t
                                        
                                    
                                
                             of action at                                 
                                    
                                        
                                            a
                                        
                                        
                                            t
                                        
                                    
                                
                            .”) 

Regarding Claim 11, Zhao and Wilson teaches the method of claim 1 (above). Zhao further teaches: 
inputting the user profile and the plurality of items into a prediction model comprises computing at least one set of state values indicative of the plurality of user attribute values and a plurality of item properties of the plurality of items (Zhao, sec. 2.1: “State space S: A state                                 
                                    
                                        
                                            s
                                        
                                        
                                            t
                                        
                                    
                                    =
                                    
                                        
                                            
                                                
                                                    s
                                                
                                                
                                                    t
                                                
                                                
                                                    1
                                                
                                            
                                            ,
                                            …
                                            ,
                                            
                                                
                                                    s
                                                
                                                
                                                    t
                                                
                                                
                                                    N
                                                
                                            
                                        
                                    
                                     
                                    ∈
                                    S
                                
                             is defined as the browsing history of a user, i.e., previous                                 
                                    N
                                
                             items that a user browsed before time                                 
                                    t
                                
                            . The items in                                 
                                    
                                        
                                            s
                                        
                                        
                                            t
                                        
                                    
                                
                             are sorted in chronological order we present a state-specific scoring function”). 

Regarding Claim 12, Zhao and Wilson teaches the method of claim 1 (above). Wilson further teaches: 
computing the at least one score further comprises: computing at least one other score, each computed for one of the plurality of items according to the plurality of user attribute values and the respective plurality of item properties of the respective item (Wilson, pg. 6, para. 0087 and pg. 7, para. 0097: “Nodes in the data network represent venues, venue properties, users, user properties, reviewers, reviewer properties, and the like. Links or links represent relations between those nodes. The number of links between two items might therefore grow as data on two items grows. The strength of each link denotes the affinity between the two connected items, such as similarity of star rating (in a review of a venue), number of attributes held in common. Links can be either positive or negative in sign.” “It should be noted that not only first order connections are updated based on feedback. Rather, in various implementations second and higher order connections are optionally updated based on feedback. For instance, when a reviewer's ranking or grade is updated the second order connection between two restaurants which are both liked by the reviewer is updated or correspondingly modified as well.” Examiner notes that the broadest reasonable interpretation of “computing at least one other score” means to calculate and keep a running account of something other than the strength of a link between two connected items, such as calculating the strength of second order connections)
aggregating the at least one score with the at least one other score (Wilson, pg. 7, para. 0099: “It should be noted that not only first order connections are updated based on feedback. Rather, in various implementations second and higher order connections are optionally updated based on feedback. For instance, when a reviewer's ranking or grade is updated the second order connection between two restaurants which are both liked by the reviewer is updated or correspondingly modified as well.” Examiner notes that the broadest reasonable interpretation of aggregating means bringing together such as here in Wilson where first and second order connections may be brought together to update high order connections).

Regarding Claim 13 Zhao and Wilson teaches the method of claim 12 (above). Zhao further teaches: 
computing the at least one other score comprises applying a content based filtering method to the plurality of user attribute values and a plurality of item properties of the plurality of items (Zhao, sec. 4: “Another common approach is content based filtering[15], which tries to recommend items with similar properties to those that a user ordered in the past.”)

Regarding Claim 14, Zhao and Wilson teaches the method of claim 1 (above). Zhao further teaches: 
computing the at least one score further comprises: identifying at least one highest score of the at least one score (Zhao, sec. 2.3: “Then after computing scores of all items, the RA selects an item with highest score as the sub-action                                 
                                    
                                        
                                            a
                                        
                                        
                                            t
                                        
                                        
                                            k
                                        
                                    
                                
                             of action                                 
                                    
                                        
                                            a
                                        
                                        
                                            t
                                        
                                    
                                
                            .”)
outputting the at least one highest score (Zhao, sec. 2.3: “For each weight vector, the RA scores all items in the item space (line 3), selects the item with highest score (line 4), and then adds this item at the end of the recommendation list.” Examiner notes that the broadest reasonable interpretation of “outputting” means to produce, deliver, or supply such as here where the item with the highest score is produced for selection).

Regarding Claim 15, Zhao and Wilson teaches the method of claim 1 (above). Wilson further teaches: 
computing the at least one score further comprises: computing at least one filtered score by applying at least one test to the at least one score (Wilson, pg. 20, para. 0204: “if there is a large number of recommendations available based on the filter set and the only issue is that none of the recommendations of the set exceed the recommendation threshold, a minimal normalization factor may be utilized to normalize the recommendation set such that a limited amount of recommendations exceed the recommendation threshold”. Examiner notes that a filter set comprises of at least one filtered score and that the presence of a recommendation threshold implies that a score or other running accounting is computed to ensure such threshold is met)
outputting the at least one filtered score. (Wilson, pg. 20, para. 0204: “if there is a large number of recommendations available based on the filter set and the only issue is that none of the recommendations of the set exceed the recommendation threshold, a minimal normalization factor may be utilized to normalize the recommendation set such that a limited amount of recommendations exceed the recommendation threshold”. Examiner notes that the broadest reasonable interpretation of outputting means to produce, deliver, or supply such as supplying a recommendation score to the system to compare against a recommendation threshold). 
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Wilson into Zhao as set forth above with respect 
to claim 1. 

Regarding Claim 16, Zhao and Wilson teaches the method of claim 2 (above). Wilson further teaches: 
computing the at least one score further comprises: computing at least one collaborative filtering score, each computed for one of the plurality of items according to another similarity between the plurality of user attribute values and the other plurality of user attribute values of the plurality of other user profiles, by applying at least one matrix factorization method to the plurality of item properties, the plurality of user attribute values and the other plurality of user attribute values (Wilson, pg. 3, paras. 0060, 0065, 0066: “The system may continuously analyze the data to add positive or negative collaborative links, content links, or content-collaborative links” “The recommendation engine 112 accesses the matrices of interrelationships and generates the recommendations according to the techniques described herein.” “The matrix builder also incorporates venue, reviewer and user data 124 collected from users 108, venues 104 and other web pages”). 
aggregating the at least one score with the at least one collaborative filtering score. (Wilson, pg. 3, para. 0061: “The system may provide a plurality of recommendations based overall link strengths that factor in collaborative and content-based interrelationships” Examiner notes that the broadest reasonable interpretation of aggregating means bringing together such as here in Wilson where the overall link strengths factor in collaborative filtering). 

It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Wilson into Zhao as set forth above with respect 
to claim 1.

Regarding Claim 17, Zhao teaches a system for: 
in at least one iteration of a plurality of iterations (Zhao, sec. 2.5: “The training algorithm for the proposed framework DEV is presented in Algorithm 3. In each iteration, there are two stages, i.e., 1) transition generating stage (lines 8-20), and 2) parameter updating stage (lines 21-28)”)
in each of a plurality of training iterations (Zhao, sec. 2.5: “The training algorithm for the proposed framework DEV is presented in Algorithm 3. In each iteration, there are two stages, i.e., 1) transition generating stage (lines 8-20), and 2) parameter updating stage (lines 21-28)”)
receiving a training user profile of a plurality of training user profiles, the training user profile having a plurality of training user attribute values; (Zhao, sec. 1.3: “which enables the framework to train the parameters offline based on the simulated reward. More specifically, we build the simulator by users’ historical records. The intuition is no matter what algorithms a recommender system adopt, given the same state (or a user’s historical records) and the same action (recommending the same items to the user), the user will make the same feedbacks to the items.”)
computing by the prediction model a plurality of predicted scores, each for one of a plurality of training items, in response to the training user profile and the plurality of training items, where each of the plurality of training items has a plurality of training item properties (Zhao, sec. 1.3: “Thus we cannot get the feedbacks (rewards) of items that are not in users’ historical records. This may result in inconsistent results between offline and online measurements. Our proposed online environment simulator can also mitigate this challenge by producing simulated online rewards given any state-action pair, so that the recommender system can rate items from the whole item space.”)
computing for the training user profile a plurality of expected scores, each computed for one of the plurality of training items according to the plurality of training user attribute values and the plurality of training item properties of the training item (Zhao, sec. 1.3: “Thus we cannot get the feedbacks (rewards) of items that are not in users’ historical records. This may result in inconsistent results between offline and online measurements. Our proposed online environment simulator can also mitigate this challenge by producing simulated online rewards given any state-action pair, so that the recommender system can rate items from the whole item space.”)
modifying at least one model value of a plurality of model values of the prediction model to maximize a reward score computed using the plurality of expected scores and the plurality of predicted scores (Zhao, sec. 2.1: “Figure 2 illustrates the agent-user interactions in MDP. By interacting with the environment (users), recommender agent takes actions (recommends items) to users in such a way that maximizes the expected return, which includes the delayed rewards. We follow the standard assumption that delayed rewards are discounted by a factor of γ per time-step”; Examiner notes that Zhao provides a discount factor in its model which is modified as given).
outputting the at least one score (Zhao, sec. 2.2: “In practice, the reward is usually a number, rather than a vector. Thus if the                                 
                                    
                                        
                                            p
                                        
                                        
                                            t
                                        
                                    
                                     
                                
                            is mapped to                                 
                                    
                                        
                                            U
                                        
                                        
                                            x
                                        
                                    
                                
                            ,we calculate the overall reward                                 
                                    
                                        
                                            r
                                        
                                        
                                            t
                                        
                                    
                                
                             of the whole recommended list”. Examiner notes that the broadest reasonable interpretation of outputting means to produce, deliver, or supply such as producing the overall reward value)
Zhao does not explicitly disclose:
predicting at least one score for at least one item, comprising at least one hardware processor adapted to 
receiving a user profile having a plurality of user attribute values 
computing the at least one score according to a similarity between the user profile and a plurality of other user profiles by inputting the user profile and a plurality of items into a prediction model trained by 
However, Wilson teaches:
predicting at least one score for at least one item, comprising at least one hardware processor adapted to (Wilson, pg. 42, para. 0366: “In some examples, the server and/or client device (e.g. desktop computer or smart phone) are implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. The apparatus is optionally implemented in a computer program product tangibly embodied in an information carrier, e.g., in a machine-readable storage device or in a propagated signal, for execution by a programmable processor”)
receiving a user profile having a plurality of user attribute values (Wilson, pg. 8, para. 0106: The system 100 may access a user profile to collect data from the user profile such as other venues liked, gender, profession, or age).
computing the at least one score according to a similarity between the user profile and a plurality of other user profiles by inputting the user profile and a plurality of items into a prediction model trained by  (Wilson, pg. 6, para. 0087: “Nodes in the data network represent venues, venue properties, users, user properties, reviewers, reviewer properties, and the like. Links or links represent relations between those nodes. The number of links between two items might therefore grow as data on two items grows. The strength of each link denotes the affinity between the two connected items, such as similarity of star rating (in a review of a venue), number of attributes held in common. Links can be either positive or negative in sign.” Examiner notes that the broadest reasonable interpretation of “computing [a] score” means to calculate and keep a running account such as calculating the strength of a link).
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Wilson into Zhao as set forth above with respect 
to claim 1.

Regarding Claim 18, Zhao and Wilson teaches the system of claim 17 (above). Wilson further teaches: 
the at least one hardware processor is adapted to outputting the at least one score via at least one digital communication network interface connected to the at least one hardware processor (Wilson, pgs. 42-43, paras. 0366, 0370: “The apparatus is optionally implemented in a computer program product tangibly embodied in an information carrier, e.g., in a machine-readable storage device or in a propagated signal, for execution by a programmable processor; and method steps are performed by a programmable processor executing a program of instructions to perform functions of the described implementations by operating on input data and generating output” “The server functionality described above is optionally implemented in a computer system that includes a back-end component, such as a data server, or that includes a middleware component, such as an application server or an Internet server, or that includes a front-end component, such as a client computer having a graphical user interface or an Internet browser, or any combination of them. The components of the system are connected by any form or medium of digital data communication such as a communication network. Examples of communication networks include, e.g., a LAN, a WAN, and the computers and networks forming the Internet”. Examiner notes that the broadest reasonable interpretation of “adapted to” means to make suitable for use, irrespective of whether actually used)
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Wilson into Zhao as set forth above with respect 
to claim 1.

Regarding Claim 19, Zhao and Wilson teaches the system of claim 17 (above). Wilson further teaches: 
wherein the at least one hardware processor is adapted to receiving the user profile by at least one of: receiving the user profile via at least one digital communication network interface connected to the at least one hardware processor, and retrieving the user profile from at least one non-volatile digital storage connected to the at least one hardware processor (Wilson, pg. 42-43, paras. 0366, 0370: “The apparatus is optionally implemented in a computer program product tangibly embodied in an information carrier, e.g., in a machine-readable storage device or in a propagated signal, for execution by a programmable processor; and method steps are performed by a programmable processor executing a program of instructions to perform functions of the described implementations by operating on input data and generating output” “The server functionality described above is optionally implemented in a computer system that includes a back-end component, such as a data server, or that includes a middleware component, such as an application server or an Internet server, or that includes a front-end component, such as a client computer having a graphical user interface or an Internet browser, or any combination of them. The components of the system are connected by any form or medium of digital data communication such as a communication network. Examples of communication networks include, e.g., a LAN, a WAN, and the computers and networks forming the Internet”. Examiner notes that the broadest reasonable interpretation of “adapted to” means to make suitable for use, irrespective of whether actually used)
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Wilson into Zhao as set forth above with respect 
to claim 1.

Regarding Claim 20, Zhao teaches a system:  
in each of a plurality of training iterations (Zhao, sec. 2.5: “The training algorithm for the proposed framework DEV is presented in Algorithm 3. In each iteration, there are two stages, i.e., 1) transition generating stage (lines 8-20), and 2) parameter updating stage (lines 21-28)”)
computing by the prediction model a plurality of predicted scores, each for one of a plurality of training items, in response to the training user profile and the plurality of training items, where each of the plurality of training items has a plurality of training item properties (Zhao, sec. 1.3: “Thus we cannot get the feedbacks (rewards) of items that are not in users’ historical records. This may result in inconsistent results between offline and online measurements. Our proposed online environment simulator can also mitigate this challenge by producing simulated online rewards given any state-action pair, so that the recommender system can rate items from the whole item space.”)
computing for the training user profile a plurality of expected scores, each computed for one of the plurality of training items according to the plurality of training user attribute values and the plurality of training item properties of the training item  (Zhao, sec. 1.3: “Thus we cannot get the feedbacks (rewards) of items that are not in users’ historical records. This may result in inconsistent results between offline and online measurements. Our proposed online environment simulator can also mitigate this challenge by producing simulated online rewards given any state-action pair, so that the recommender system can rate items from the whole item space.”)
modifying at least one model value of a plurality of model values of the prediction model to maximize a reward score computed using the plurality of expected scores and the plurality of predicted scores  (Zhao, sec. 2.1: “Figure 2 illustrates the agent-user interactions in MDP. By interacting with the environment (users), recommender agent takes actions (recommends items) to users in such a way that maximizes the expected return, which includes the delayed rewards. We follow the standard assumption that delayed rewards are discounted by a factor of γ per time-step”; Examiner notes that Zhao provides a discount factor in its model which is modified as given).
Zhao does not explicitly disclose:
comprising at least one hardware processor adapted to 
receiving a training user profile of a plurality of training user profiles, the training user profile having a plurality of training user attribute values
However, Wilson teaches: 
comprising at least one hardware processor adapted to (Wilson, pg. 42, para. 0366: “The apparatus is optionally implemented in a computer program product tangibly embodied in an information carrier, e.g., in a machine-readable storage device or in a propagated signal, for execution by a programmable processor; and method steps are performed by a programmable processor executing a program of instructions to perform functions of the described implementations by operating on input data and generating output”. Examiner notes that the broadest reasonable interpretation of “adapted to” means to make suitable for use, irrespective of whether actually used)
receiving a training user profile of a plurality of training user profiles, the training user profile having a plurality of training user attribute values (Wilson, pg. 8, para. 0106: “The system 100 may access a user profile to collect data from the user profile such as other venues liked, gender, profession, or age”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Wilson into Zhao as set forth above with respect 
to claim 1.
Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Liu, et. al. (“Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling”, 30 Oct. 2018, ArXiv) teaches a deep reinforcement learning recommender that considers dynamic adaptation and long-term rewards
Mustafi, et al. (US 20200250715 A1) teaches a hybrid learning machine for recommending items based on a consumer’s previous behavior, other consumers’ previous behaviors, and consumer’s profile. 
Schiff, et. al. (US 20120095862 A1) teaches a system and method for generating personalized recommendations based on stored data about the user.
Taghipour, et. al. (“A Hybrid Web Recommender System Based on Q-Learning”, 20 Mar. 2008, SAC’08) teaches a hybrid web recommendation method making use of the conceptual relationships among web resources to derive a novel model of the problem, enriched with semantic knowledge about the usage behavior
Wei, et. al. (“Collaborative filtering and deep learning based recommendation system for cold start items”, 14 Oct. 2016, Expert Systems with Applications) teaches a flexible scheme of model retraining and switching is proposed to deal with the transition of items from cold start to non-cold start status.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sally T. Nguyen whose telephone number is (571) 272-3406. The examiner can normally be reached Monday - Thursday, 9:00am - 5:00pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on (571) 270-3428. The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/STN/Examiner, Art Unit 2123                                                                                                                                                                                                        

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123