DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Allowable Subject Matter
Claim 7, 9, 15, and 20 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 2, 5, 6, 8, 10-12, 14, 16-17, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Murdock et al. (US 2019/0235936), hereafter “Murdock,” in view of Ghavamzadeh et al. (US 2016/0283970), hereafter “Ghavamzadeh,” and further in view of Chen et al. (WO 2020/207249), hereafter “Chen.”
Regarding claim 1, Murdock teaches a method comprising: 	identifying features of a state used for reinforcement learning in an online service, the state comprising an activity level of a user and a number of notifications sent to the user (Murdock: par 0054 […number and/or frequency of notifications…] frequency or level of previous contact with the user…), the state being associated with an action that is a decision made to decide if a notification to the user is sent and a reward for sending the notification to the user (Murdock: 420-470 of FIG. 4; par 0087, 0093); 	capturing user responses to notifications sent to users to obtain training data (Murdock: par 0091, 0092);	training a machine-learning (ML) algorithm with reinforcement learning based on the features and the training data to obtain an ML model (Murdock: 500 of FIG. 5; par 0089, 0104); 	receiving a request to send a notification to the user (Murdock: 410 of FIG. 4; par 0086); 	deciding, by the ML model, whether to send the notification based on a current state (Murdock: 420, 430, 470 of FIG. 4; 500 of FIG. 5; par 0097, 0104); and 	sending the notification to the user based on the decision (Murdock: 470 of FIG. 4; par 0099). 	Murdock does not explicitly teach: 	the reward for each state based on the user visiting the online service within a predetermined period after sending the notification to the user. 	Ghavamzadeh teaches: 	a reward for each state based on a user visiting an online service after sending a notification to the user (Ghavamzadeh: par 0176 [A stage-wise reward of 1 indicates that the user clicks the advertising content and zero otherwise.]). 	It would have been obvious to one of ordinary skill in the art to incorporate the click-based advertising rewards of Ghavamzadeh within the reinforcement learning model of Murdock with predictable results. One would be motivated to make the combination to provide the predictable benefit of evaluating the effectiveness of advertising within the system. One would further be motivated to make the combination as both systems display content to users based on utilizing a reinforcement learning model. Further, in view of this substantial similarity of the references it would have been readily apparent to one of ordinary skill that various beneficial features of Ghavamzadeh could have been implemented within the Murdock system with predictable results and a beneficial effect. 	Murdock-Ghavamzadeh does not explicitly teach: 	within a predetermined period after sending the notification to the user. 	Chen teaches: 	within a predetermined period after sending a notification to a user (Chen: p. 9 […if the user views the notification message in the time interval corresponding to the action data with the largest Q value, the reward value of the notification message is recorded as f1; if the time interval has elapsed as time goes by, and If the user does not view the notification message, the reward value of the notification message is recorded as f2…]). 

It would have been obvious to utilize the time period of Chen when evaluating the reward in the Murdock-Ghavamzadeh with predictable results. One would be motivated to make the combination because if a user doesn’t interact with a notification after a threshold period of time it is apparent that the notification did not sufficiently interest the user and therefore the reward should be reduced. A high likelihood of success is anticipated given that Chen, like Murdock-Ghavamzadeh, discloses a system for evaluating the effectiveness of notifications using a reinforcement learning model. Further, in view of the substantial similarity of the references it would have been readily apparent to one of ordinary skill that various beneficial features of Chen could have been implemented within the Murdock-Ghavamzadeh system with predictable results and a beneficial effect. 

Regarding claim 2, the method as recited in claim 1, wherein deciding whether to send further comprises: 	maximizing a total number of rewards of a plurality of states, the reward for each state based on the user visiting the online service within a predetermined period after sending the notification to the user (Murdock: par 0040, 0071; Chen: p. 9). 

Regarding claim 5, the method as recited in claim 1, wherein the state changes every four hours (Murdock: par 0073; Chen: p. 9 – It would have been immediately apparent to one of ordinary skill that the intervals of Chen could be set to any arbitrary period, including four hours, and doing so would have been obvious to one of ordinary skill). 

Regarding claim 6, the method as recited in claim 1, wherein the ML algorithm is further based on user features having user profile information (Murdock: par 0036). 

Regarding claim 8, the method as recited in claim 1, wherein notifications include news articles, user feed updates, messages to the user from other users, and promotions (Murdock: par 0021). 

Regarding claim 10, the method as recited in claim 1, further comprising: 	re-training the ML algorithm with additional data corresponding to notifications sent to users (Murdock: par 0040, 0052). 

Regarding claim 11, a system comprising: 	a memory comprising instructions (Murdock: par 0115); and 	one or more computer processors, wherein the instructions, when executed by the one or more computer processors, cause the system to perform operations comprising (Murdock: par 0115): 		identifying features of a state used for reinforcement learning in an online service (Murdock: 420-470 of FIG. 4; par 0087, 0093), the state comprising an activity level of a user and a number of notifications sent to the user (Murdock: par 0054 […number and/or frequency of notifications…] frequency or level of previous contact with the user…), the state being associated with an action that is a decision made to decide if a notification to the user is sent and a reward for sending the notification to the user (Murdock: 420-470 of FIG. 4; par 0087, 0093), the reward for each state based on the user visiting the online service within a predetermined period after sending the notification to the user (Ghavamzadeh: par 0176 [A stage-wise reward of 1 indicates that the user clicks the advertising content and zero otherwise.]; Chen: p. 9 […if the user views the notification message in the time interval corresponding to the action data with the largest Q value, the reward value of the notification message is recorded as f1; if the time interval has elapsed as time goes by, and If the user does not view the notification message, the reward value of the notification message is recorded as f2…]); 		capturing user responses to notifications sent to users to obtain training data (Murdock: par 0091, 0092);		training a machine-learning (ML} algorithm with reinforcement learning based on the features and the training data to obtain an ML model (Murdock: 500 of FIG. 5; par 0089, 0104); 		receiving a request to send a notification to the user (Murdock: 410 of FIG. 4; par 0086); 		deciding, by the ML model, whether to send the notification based on a current state (Murdock: 420, 430, 470 of FIG. 4; 500 of FIG. 5; par 0097, 0104); and 		sending the notification to the user based on the decision (Murdock: 470 of FIG. 4; par 0099).  

Regarding claim 12, the system as recited in claim 11, wherein deciding whether to send further comprises: 	maximizing a total number of rewards for a plurality of states, the reward for each state based on the user visiting the online service within a predetermined period after sending the notification to the user (Murdock: par 0040, 0071; Chen: p. 9).

Regarding claim 14, the system as recited in claim 11, wherein the state changes every four hours (Murdock: par 0073; Chen: p. 9 – It would have been immediately apparent to one of ordinary skill that the intervals of Chen could be set to any arbitrary period, including four hours, and doing so would have been obvious to one of ordinary skill). 

Regarding claim 16, a non-transitory machine-readable storage medium including instructions that, when executed by a machine, cause the machine to perform operations comprising (Murdock: par 0115): 	identifying features of a state used for reinforcement learning in an online service (Murdock: 420-470 of FIG. 4; par 0087, 0093), the state comprising an activity level of a user and a number of notifications sent to the user (Murdock: par 0054 […number and/or frequency of notifications…] frequency or level of previous contact with the user…), the state being associated with an action that is a decision made to decide if a notification to the user is sent and a reward for sending the notification to the user (Murdock: 420-470 of FIG. 4; par 0087, 0093), the reward for each state based on the user visiting the online service within a predetermined period after sending the notification to the user (Ghavamzadeh: par 0176 [A stage-wise reward of 1 indicates that the user clicks the advertising content and zero otherwise.]; Chen: p. 9 […if the user views the notification message in the time interval corresponding to the action data with the largest Q value, the reward value of the notification message is recorded as f1; if the time interval has elapsed as time goes by, and If the user does not view the notification message, the reward value of the notification message is recorded as f2…]); 		capturing user responses to notifications sent to users to obtain training data (Murdock: par 0091, 0092);		training a machine-learning (ML} algorithm with reinforcement learning based on the features and the training data to obtain an ML model (Murdock: 500 of FIG. 5; par 0089, 0104); 		receiving a request to send a notification to the user (Murdock: 410 of FIG. 4; par 0086); 		deciding, by the ML model, whether to send the notification based on a current state (Murdock: 420, 430, 470 of FIG. 4; 500 of FIG. 5; par 0097, 0104); and 		sending the notification to the user based on the decision (Murdock: 470 of FIG. 4; par 0099).

Regarding claim 17, the non-transitory machine-readble storage medium as recited in claim 16, wherein deciding whether to send further comprises: 	maximizing a total number of rewards for a plurality of states, the reward for each state based on the user visiting the online service within a predetermined period after sending the notification to the user (Murdock: par 0040, 0071; Chen: p. 9).

Regarding claim 19, the non-transitory machine-readable storage medium as recited in claim 16, wherein the state changes every four hours (Murdock: par 0073; Chen: p. 9 – It would have been immediately apparent to one of ordinary skill that the intervals of Chen could be set to any arbitrary period, including four hours, and doing so would have been obvious to one of ordinary skill). 

Claims 4, 13, and 18 are rejected as being unpatentable over Murdock et al. (US 2019/0235936), in view of Ghavamzadeh et al. (US 2016/0283970), further in view of Chen et al. (WO 2020/207249), and further in view of Horvitz et al. (US 2004/0143636), hereafter “Horvitz.”
Regarding claim 4, Murdock-Ghavamzadeh-Chen teaches the method as recited in claim 1, wherein the features of the state further comprise a number of queued notifications for the user (Murdock: par 0054), and a maximum ranking score for the queued notifications (Murdock: par 0109-0110). 	Murdock-Ghavamzadeh-Chen does not explicitly teach: 	wherein the features of the state further comprise a type of client machine used by the user in a last session.	Horvitz teaches: 	wherein features of a state further comprise a type of client machine used by the user in a last session (Horvitz: par 0202).	It would have been obvious to one of ordinary skill in the art to implement the device type considerations of Horvitz within the Murdock-Ghavamzadeh-Chen system with predictable results. One would be motivated to make the combination because it would have been apparent that considering the device type a user is interacting with would be a relevant consideration when deciding whether to present certain notifications i.e. some notifications may be better suited for certain device types than others. A high likelihood of success is anticipated given that Horvitz, like Murdock-Ghavamzadeh-Chen, analyze the context of a user and their device(s) when determining an appropriate time to display notifications. Further, in view of this substantial similarity it would have been readily apparent to one of ordinary skill that various beneficial features of Horvitz could have been implemented within the Murdock-Ghavamzadeh-Chen system with predictable results and a beneficial effect.

Regarding claim 13, the system as recited in claim 11, wherein the features of the state comprise a type of client machine used by the user in a last session (Horvitz: par 0202), a number of queued notifications for the user (Murdock: par 0054), and a maximum ranking score for the queued notifications (Murdock: par 0109-0110).

Regarding claim 18, the non-transitory machine-readable storage medium as recited in claim 16, wherein the features of the state comprise a type of client machine used by the user in a last session (Horvitz: par 0202), a number of queued notifications for the user (Murdock: par 0054), and a maximum ranking score for the queued notifications (Murdock: par 0109-0110).

Response to Arguments
Applicant’s arguments, filed August 8, 2022, have been fully considered and are discussed in detail below. 

With respect to the rejection of claim 1 under 35 USC § 103, Applicant argues that the user’s context of Murdock is not a “state used for reinforcement learning” as claimed. Examiner respectfully disagrees; Murdock explicitly discloses in paragraph 0067 that the user’s context is used in reinforcement learning (emphasis Examiner’s): “Use of different machine learning-processes are possible in different aspects, including supervised learning processes (e.g., decision tree, random forest, logistic regression), unsupervised learning (e.g., apriori algorithm, K-means), or reinforcement learning. In each case, the contextual data and notification description is an input used to determine the user-specific interaction probability. Essentially, the user-specific interaction probability is a measure of similarity between the current notification and context and the previous notification events where the user interacted with the notification.” Accordingly, the context of Murdock may fairly be described as a “state used for reinforcement learning,” as claimed. 

The remaining arguments depend on or relate to the arguments addressed above, are moot because allowability has been indicated, or are moot in view of the new grounds of rejection presented herein, the new grounds of rejection having been made necessary by the claim amendments. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAMES E SPRINGER whose telephone number is (571)270-5640. The examiner can normally be reached 9am - 5:30pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, GLENTON BURGESS can be reached on 571-272-3949. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

JAMES E. SPRINGER
Primary Examiner
Art Unit 2454



/JAMES E SPRINGER/           Primary Examiner, Art Unit 2454