Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Amendments
Claims 1-6, 9-13, and 16-18 are amended. Claims 1-20 are pending and have been considered.

Claim Objections
Claims 1, 9-10, and 16 are objected to because of the following informalities: 
In Claim 1, line 7, and in claim 16, line 2 on p. 7, “previous” should recite “previously.” 
In Claim 1, line 13, and in Claim 9, line 6, the “and” should be removed. 
In Claim 10, line 2 recites “of the” twice in a row. 
Claim 16 recites both “transitional probabilities” and “transition probabilities”. These terms should be consistent throughout Claim 16 and its dependents.  Appropriate correction is required.

Examiner’s Note
Claims 12 and 13 each recite “the updated transition probabilities”. It appears these features were supposed to recite “the updated parameter value”. Examiner did not reject or object to claims 12 or 13 on these grounds in the preinterview first office action filed 05/24/2022. However, for purposes of examination, examiner interprets claims 12 and 13 as if they had recited “the updated parameter value” instead of “the updated transition probabilities.” Examiner requests the Applicant to apply these changes to claims 12 and 13 in order to avoid future claim rejections under 35 U.S.C. 112.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
	In claim 1, lines 4-5, it is unclear if the phrase “from a reinforcement learning agent” modifies the feature “obtaining passive data” or “recommended actions”. Examiner interprets the claim to mean the phrase “from a reinforcement learning agent” modifies “recommended actions”. 
In claim 1, lines 12-14, it is unclear how the system updates transitional probabilities without first initializing them. It is unclear if the limitation “by applying the updated parameter value to the passive data” is supposed to modify “updating a transition model of the reinforcement learning agent” or “generating updated transition probabilities”. For purposes of examination, Examiner interprets claim 1, lines 12-14 to mean initializing transition probabilities and then updating a transition model of the reinforcement learning agent by generating updated transition probabilities, wherein the generating comprises applying the updated parameter value to the passive data. Claims 2-8 are rejected for failing to cure the deficiencies of claim 1 upon which they depend.
Regarding claim 3, it is unclear if any of the states and recommended actions are the same as the current state and a recommended action from claim 1. Examiner interprets claim 3 to mean the states and recommended actions from claim 3 include a current state and a recommended action from claim 1.
Claim 7 recites “a plurality of states” and Claim 1, line 15 recites “a current state.” It is unclear if a plurality of states includes a current state. Claim 7 recites “a preliminary parameter value” and “a shared parameter value” while Claims 1 and 6 recite an updated parameter value. It is unclear if either parameter value from claim 7 is the same as the updated parameter value from claim 1. Examiner interprets claim 7 to mean a plurality of states includes a current state, and the shared parameter value from claim 7 corresponds to the updated parameter value from claim 1. Claim 8 is similarly indefinite because it is unclear if “the shared parameter values” is the same as the updated parameter value from claim 1. Examiner interprets claim 8 to mean the shared parameter values are same as the updated parameter value. 
Claim 9, line 11 recites “and associated state information”. It is unclear to the Examiner which verb modifies associated state information. Examiner interprets the claim to mean the active data includes associated state information. Claims 10-15 are rejected for failing to cure the deficiencies of claim 9 upon which they depend.
Claim 14 recites “a plurality of states” and Claim 9, line 10 on p. 5 recites “a current state.” It is unclear if a plurality of states includes a current state. Claim 14 recites “a preliminary parameter value” and “a shared parameter value” while Claims 9 recites an updated parameter value. It is unclear if either parameter value from claim 14 is the same as the updated parameter value from claim 9. Examiner interprets claim 14 to mean a plurality of states includes a current state, and the shared parameter value from claim 14 corresponds to the updated parameter value from claim 9. Claim 15 is similarly indefinite because it is unclear if “the shared parameter values” is the same as the updated parameter value from claim 9. Examiner interprets claim 15 to mean the shared parameter values are same as the updated parameter value. 
In claim 16, lines 4-5 on p. 7, it is unclear how the system updates a parameter value without first initializing the parameter value; it is unclear whether the parameter value in the limitation “an updated parameter value derived using the passive data” is different from the parameter value in the limitation “an updated parameter value derived using the active data”; and it is unclear if the passive/active data was used to initialize the parameter value, update the value, or both. For purposes of examination, the Examiner interprets the claim to mean the system first initializes a parameter value and then updates the parameter value using passive and active data. Claims 17-20 are rejected for failing to cure the deficiencies of claim 16 upon which they depend.
Regarding claim 17, it is unclear if any of the states and recommended actions are the same as the current state and a recommended action from claim 16. Examiner interprets claim 17 to mean the states and recommended actions from claim 17 include a current state and a recommended action from claim 16.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

CLAIM 1
Step 1: The claim recites a product, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
updating a parameter value used by the reinforcement learning agent based on the active data; (Judgement mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. The limitation is also a mathematical concept.)
updating a transition model… by generating updated transition probabilities by applying the updated parameter value to the passive data; and (Applying the updated parameter value to passive data, updating transition probabilities, and updating a transition model are all judgement mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. The limitations are also mathematical concepts.)
generating… a recommended action based on a current state and the updated transition probabilities; and (Judgement mental process which can reasonably be performed in one’s mind with the aid of pencil and paper.)
The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites
the following additional elements:
One or more computer storage media storing computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform operations (Generic computer components performing generic computer functions as discussed in MPEP 2106.05(f))
obtaining passive data comprising sequences of user actions without recommended actions from a reinforcement learning agent; (Obtaining passive data is mere data-gathering, an insignificant extra-solution activity as discussed in MPEP 2106.05(g). A reinforcement learning agent is generally linking the abstract ideas to the technological environment of machine learning as discussed in MPEP 2106.05(h).)
obtaining active data including previous recommended actions provided by the reinforcement learning agent; (Obtaining active data is mere data-gathering, an insignificant extra-solution activity as discussed in MPEP 2106.05(g).)
providing the recommended action for presentation. (Presenting or displaying information is an insignificant extra-solution activity as discussed in MPEP 2106.05(g).)
Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. One or more computer storage media storing computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform operations amount to generic computer components performing generic computer functions as discussed in MPEP 2106.05(f). Obtaining passive data and active data is mere data-gathering, an insignificant extra-solution activity as discussed in MPEP 2106.05(g). A reinforcement learning agent is generally linking the abstract ideas to the technological environment of machine learning as discussed in MPEP 2106.05(h). Presenting or displaying information is an insignificant extra-solution activity as discussed in MPEP 2106.05(g).
	Obtaining passive data and active data is well-understood, routine, conventional activity of receiving data over a network as discussed in MPEP 2106.05(d)(II), example (i). Presenting or displaying information is well-understood, routine, conventional activity as discussed in MPEP 2106.05(d)(I)(2). Svenson et al. (US 10943311 B1) at C. 11, L. 43-46 provides Berkheimer evidence for presenting or displaying information: “the display 118 of the user device 104 may include any type of display 118 known in the art that is configured to present (e.g., display) information to the users 102.” The claim is not patent eligible.

CLAIM 2 incorporates the rejection of claim 1. 
Step 1: The claim recites a product, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 1 are incorporated. The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites
the following additional elements:  the reinforcement learning agent is part of a Markov decision process-based recommendation system. This additional element is generally linking the abstract ideas to the technological environment of machine learning as discussed in MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. A reinforcement learning agent and a Markov decision process-based recommendation system are generally linking the abstract ideas to the technological environment of machine learning as discussed in MPEP 2106.05(h). The claim is not patent eligible.

CLAIM 3 incorporates the rejection of claim 1. 
Step 1: The claim recites a product, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 1 are incorporated. The claim recites: the updated transition probabilities comprise probabilities between each pair of a plurality of states for each of a plurality of recommended actions. This limitation is a mathematical concept. The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim does not recite any additional elements.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.

CLAIM 4 incorporates the rejection of claim 1. 
Step 1: The claim recites a product, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 1 are incorporated. The claim recites: the updated transition probabilities are updated using maximum likelihood principle. This limitation is a mathematical concept. The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim does not recite any additional elements.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.

CLAIM 5 incorporates the rejection of claim 1. 
Step 1: The claim recites a product, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 1 are incorporated. The claim recites: the updated parameter value is derived from the active data using an n-gram model. This limitation is a mathematical concept. The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim does not recite any additional elements.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.

CLAIM 6 incorporates the rejection of claim 1. 
Step 1: The claim recites a product, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 1 are incorporated. The claim recites: the updated parameter value is derived using a clustering algorithm. This limitation is a mathematical concept. The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim does not recite any additional elements.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.

CLAIM 7 incorporates the rejection of claim 6. 
Step 1: The claim recites a product, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 6 are incorporated. The claim recites:
the clustering algorithm comprises: determining a preliminary parameter value for each of a plurality of states based on the active data;  (Determining is an evaluation and a judgement mental process which can be reasonably performed in one’s mind with the aid of pencil and paper)
grouping states into one or more clusters based on the preliminary parameter value for each state; (Grouping is an evaluation and a judgement mental process which can be reasonably performed in one’s mind with the aid of pencil and paper)
deriving a shared parameter value for each cluster based on the preliminary parameter value for each state in the cluster; (Deriving a shared parameter value is an evaluation and a judgement mental process which can be reasonably performed in one’s mind with the aid of pencil and paper, and it is a mathematical concept.)
assigning the shared parameter value for each cluster to each state grouped in the cluster. (Assigning is an evaluation and a judgement mental process which can be reasonably performed in one’s mind with the aid of pencil and paper)
The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim does not recite any additional elements.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.

CLAIM 8 incorporates the rejection of claim 7. 
Step 1: The claim recites a product, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 7 are incorporated. The claim recites: the states are grouped into the one or more clusters based on confidence values for the shared parameter values for the clusters. Grouping based on confidence values is an evaluation and a judgement mental process which can be reasonably performed in one’s mind with the aid of pencil and paper. The claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim does not recite any additional elements.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.

Regarding CLAIM 9, the claim is directed to a method that implements the same features as the product of claim 1 and is therefore rejected for the reasons therein. Claim 9 recites additional limitations. In Step 2A Prong 1, the limitation of “using the passive data and the active data to provide updated transition probabilities between each pair of a plurality of states for each of a plurality of recommended actions” is a mathematical concept. Accordingly, the claim recites an abstract idea. In Step 2A Prong 2, the limitation “obtaining active data including associated state information state information associated with recommended actions” is mere data-gathering, an insignificant extra-solution activity as discussed in MPEP 2106.05(g). The additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. In Step 2B, the obtaining limitation is well-understood, routine, conventional activity as discussed in MPEP 2106.05(d)(II), example (i). The claim is not patent eligible.

	Regarding CLAIMS 10-15, the claims are directed to methods that implement the same features as the products of claims 2 and 4-8, respectively, and are therefore rejected for the reasons therein. 

CLAIM 16
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
updating a transition model… using passive data and active data to provide updated transitional probabilities, the passive data comprising sequences of user actions without recommended actions… , the active data comprising previous recommended actions provided by… , (Updating a transition model is a judgement mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. The limitation is also a mathematical concept.)
the updated transition probabilities generated by applying an updated parameter value derived using the passive data and an updated parameter value derived using the active data; (Applying updated parameter values are mental processes which can reasonably be performed in one’s mind with the aid of pencil and paper. The limitation is also a mathematical concept.)
generating… a recommended action based on a current state and the updated transition probabilities; and (Judgement mental process which can reasonably be performed in one’s mind with the aid of pencil and paper.)
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites
the following additional elements:
one or more processors; and one or more computer-storage media storing instructions, that when used by the one or more processors, cause the one or more processors to perform operations (Generic computer components performing generic computer functions as discussed in MPEP 2106.05(f))
a reinforcement learning agent (Generally linking the abstract ideas to the technological environment of machine learning as discussed in MPEP 2106.05(h).)
providing the recommended action for presentation. (Presenting or displaying information is an insignificant extra-solution activity as discussed in MPEP 2106.05(g).)
Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Processors and computer storage media storing instructions amount to generic computer components performing generic computer functions as discussed in MPEP 2106.05(f). A reinforcement learning agent is generally linking the abstract ideas to the technological environment of machine learning as discussed in MPEP 2106.05(h). Presenting or displaying information is an insignificant extra-solution activity as discussed in MPEP 2106.05(g).
Presenting or displaying information is well-understood, routine, conventional activity as discussed in MPEP 2106.05(d)(I)(2). Svenson et al. (US 10943311 B1) at C. 11, L. 43-46 provides Berkheimer evidence: “the display 118 of the user device 104 may include any type of display 118 known in the art that is configured to present (e.g., display) information to the users 102.” The claim is not patent eligible.

CLAIM 17 incorporates the rejection of claim 16. 
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 16 are incorporated. The claim recites: the updated transition probabilities comprise probabilities between each pair of a plurality of states for each of a plurality of recommended actions. This limitation is a mathematical concept. The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim does not recite any additional elements.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.

CLAIM 18 incorporates the rejection of claim 16. 
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 16 are incorporated. The claim recites: the updated parameter value is derived using a clustering algorithm. This limitation is a mathematical concept. The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim does not recite any additional elements.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.

CLAIM 19 incorporates the rejection of claim 18.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 18 are incorporated. The claim recites:
the clustering algorithm comprises: determining a preliminary parameter value for each of a plurality of states based on the active data;  (Determining is an evaluation and a judgement mental process which can be reasonably performed in one’s mind with the aid of pencil and paper)
grouping states into one or more clusters based on the preliminary parameter value for each state; (Grouping is an evaluation and a judgement mental process which can be reasonably performed in one’s mind with the aid of pencil and paper)
deriving a shared parameter value for each cluster based on the preliminary parameter value for each state in the cluster; (Deriving a shared parameter value is an evaluation and a judgement mental process which can be reasonably performed in one’s mind with the aid of pencil and paper, and it is a mathematical concept.)
assigning the shared parameter value for each cluster to each state grouped in the cluster. (Assigning is an evaluation and a judgement mental process which can be reasonably performed in one’s mind with the aid of pencil and paper)
The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim does not recite any additional elements.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.

CLAIM 20 incorporates the rejection of claim 19.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 19 are incorporated. The claim recites: the states are grouped into the one or more clusters based on confidence values for the shared parameter values for the clusters. Grouping based on confidence values is an evaluation and a judgement mental process which can be reasonably performed in one’s mind with the aid of pencil and paper. The claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim does not recite any additional elements.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-7, 9-14, and 16-19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Shani et al. (“An MDP-Based Recommender System”, cited in the IDS filed 04/23/2018).

	Regarding CLAIM 1, Shani teaches: One or more computer storage media storing computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform operations comprising: (The experimental results in § 6 on pp. 1287-1291 are evidence of computer storage media storing instructions executed by computing devices. In particular, the paragraph on p. 1289 starting with “Next, we performed an experiment” through § 6.1 on p. 1290 discloses running the MDP-based recommendation system on an online bookstore (MDP is Markov Decision Process). § 6.2 on pp. 1290-91 discloses computational analysis for model build time, recommendations per second, and memory footprints for the MDP-based system.)
obtaining passive data comprising sequences of user actions without recommended actions from a reinforcement learning agent; (P. 1272, § 3.1, lines 3-6 teaches: “The states in our MC model represent the relevant information that we have about the user. This information corresponds to previous choices made by users in the form of a set of ordered sequences of selections… Thus, the set of states contains all possible sequences of user selections.” The previous choices were made without recommended actions, as taught by p. 1266, in the second-to-last paragraph, the last 3 lines, and p. 1271 § 3, lines 1-4. 
Under the broadest reasonable interpretation of this limitation, obtaining passive data includes gathering sequences of user data without recommendations and obtaining an initialized predictive model of the user based on sequences of the gathered user data. On p. 1280, § 5.1.1, the paragraph starting with “To allow for” through equation 17 teaches initializing a predictive model with the parameters of various transition probabilities. On p. 1285, the bottom paragraph through equation 36 teaches initializing counts at time                         
                            t
                            =
                            0
                        
                     based on the initial transition probabilities computed on p. 1280.)
obtaining active data including previous recommended actions provided by the reinforcement learning agent; (P. 1279, § 5.1, ¶ 2 teaches: “The actions of the MDP correspond to a recommendation of an item.” § 5.1 teaches that when the system recommends an item                         
                            
                                
                                    x
                                
                                
                                    '
                                
                            
                        
                    , the user may accept this recommendation or select some non-recommended item                         
                            
                                
                                    x
                                
                                
                                    '
                                    '
                                
                            
                        
                    . Lastly, p. 1285, § 5.3, ¶ 2 teaches in order to re-estimate the transition function the following counts are obtained from the recently collected statistics: the number of times the r recommendation was accepted in state s and the number of times the user took item r in state s even though it was not recommended.)
updating a parameter value used by the reinforcement learning agent based on the active data; (P. 1285, all of § 5.3 on the page teaches updating the model via reinforcement learning based on a set of statistics about the recommendations made to users. Parameter values include                         
                            
                                
                                    c
                                
                                
                                    i
                                    n
                                
                            
                            
                                
                                    s
                                    ,
                                    r
                                    ,
                                    s
                                    ∙
                                    r
                                
                            
                        
                    ,                         
                            
                                
                                    c
                                
                                
                                    o
                                    u
                                    t
                                
                            
                            
                                
                                    s
                                    ,
                                    r
                                    ,
                                    s
                                    ∙
                                    r
                                
                            
                        
                    ,                         
                            
                                
                                    c
                                
                                
                                    t
                                    o
                                    t
                                    a
                                    l
                                
                            
                            
                                
                                    s
                                    ,
                                    s
                                    ∙
                                    r
                                
                            
                        
                    ,                         
                            c
                            o
                            u
                            n
                            t
                            
                                
                                    s
                                    ,
                                    r
                                    ,
                                    s
                                    ∙
                                    r
                                
                            
                        
                    , and                         
                            c
                            o
                            u
                            n
                            t
                            
                                
                                    s
                                    ,
                                    s
                                    ∙
                                    r
                                
                            
                        
                    . Updating parameter values at time                         
                            t
                            +
                            1
                             
                        
                    are based on counts from time                         
                            t
                        
                    . )
updating a transition model of the reinforcement learning agent by generating updated transition probabilities by applying the updated parameter value to the passive data; and (P. 1270, § 2.5, ¶ 2 teaches: “An MDP is by definition a four-tuple:                         
                            
                                
                                    S
                                    ,
                                    A
                                    ,
                                    R
                                    w
                                    d
                                    ,
                                    t
                                    r
                                
                            
                        
                    , where …  tr is the state-transition function, which provides the probability of a transition between every pair of states given each action.” Updating the transition probabilities is explicitly taught on p. 1285, in the paragraph before equation 29: “We compute the new counts and the new approximation for the transition function at time t +1 based on the counts and probabilities at time t as follows:” Equations 32-33 teach computing new approximation for the transition function at time                         
                            t
                            +
                            1
                        
                     based on the counts and probabilities at time                         
                            t
                        
                    . In equations 29-33, updated parameter values include each “count” at time                         
                            t
                            =
                            1
                        
                     and passive data includes                         
                            
                                
                                    c
                                
                                
                                    i
                                    n
                                
                            
                        
                    ,                         
                            
                                
                                    c
                                
                                
                                    t
                                    o
                                    t
                                    a
                                    l
                                
                            
                        
                    , and                         
                            
                                
                                    c
                                
                                
                                    o
                                    u
                                    t
                                
                            
                        
                     from time                         
                            t
                            =
                            0
                        
                    .)
generating, by the reinforcement learning agent, a recommended action based on a current state and the updated transition probabilities; and (See p. 1270, § 2.5, ¶ 2-3. The optimal policy                         
                            π
                        
                     executes the action                         
                            a
                             
                            =
                             
                            π
                            
                                
                                    s
                                
                            
                        
                     for the current state s, and the policy includes transition probabilities tr.)
providing the recommended action for presentation. (P. 1287, § 6, ¶ 3 teaches “Users received recommendations when adding items to the shopping cart. The recommendations were based on the last k items added to the cart ordered by the time they were added. An example is shown in Figure 4 [p. 1288] where the three book covers at the bottom are the recommended items.”) 

Regarding CLAIM 2, Shani teaches: The one or more computer storage media of claim 1, wherein the reinforcement learning agent is part of a Markov decision process-based recommendation system. (P. 1266, ¶ 1; p. 1270, § 2.5 ¶ 1-3; p. 1272, all of § 3.1)

Regarding CLAIM 3, Shani teaches: The one or more computer storage media of claim 1, wherein the updated transition probabilities comprise probabilities between each pair of a plurality of states for each of a plurality of recommended actions. (P. 1270, § 2.5, ¶ 2, lines 2-4 starting at “tr is…”; and p. 1272, §3.1, all of “The Transition Function”)

Regarding CLAIM 4, Shani teaches: The one or more computer storage media of claim 1, wherein the updated transition probabilities are updated using maximum likelihood principle. (P. 1272, “The Transition Function”, lines 1-5 teach a maximum-likelihood estimate can be used to estimate the transition probabilities.)

Regarding CLAIM 5, Shani teaches: The one or more computer storage media of claim 1, wherein the updated parameter value is derived from the active data using an n-gram model.
(N-grams are disclosed by p. 1269, all of §2.4 and by p. 1271, the last paragraph of § 2.5. All of § 3.2 on pp. 1272-1273 teach enhancements to the maximum-likelihood n-gram model utilizing n-grams. P. 1283, paragraph starting “When a recommendation” teaches handling new states using a finite mixture of unigram, bigram, and trigrams.)

Regarding CLAIM 6, Shani teaches: The one or more computer storage media of claim 1, wherein the updated parameter value is derived using a clustering algorithm. (1269, §2.4, lines 6-9; p. 1273, second paragraph starting “A second enhancement” through equation 8; p. 1283, paragraph starting “When a recommendation” teaches handling new states using clustering.)

Regarding CLAIM 7, Shani teaches: The one or more computer storage media of claim 6, wherein the clustering algorithm comprises: determining a preliminary parameter value for each of a plurality of states based on the active data; (On p. 1269, the last 4 lines teach: “Clustering is an approach that groups some states together for purposes of predicting next states. For example, we can group items such a basketball, football, and volleyball into a ‘sports ball’ class. Such grouping helps to address the problem of data sparsity.” The items of basketball, football, and volleyball are interpreted as preliminary parameter values. P. 1283, paragraph starting “When a recommendation” teaches handling new states based on the active data using clustering.)
grouping states into one or more clusters based on the preliminary parameter value for each state; (On p. 1269 in the last 4 lines, the “sports ball” class consists of a cluster of basketball, football, and volleyball. P. 1283, paragraph starting “When a recommendation” teaches handling new states based on the active data using clustering.)
deriving a shared parameter value for each cluster based on the preliminary parameter value for each state in the cluster; and (On p. 1269 in the last 4 lines, the “sports ball” class is a shared parameter value. P. 1283, paragraph starting “When a recommendation” teaches handling new states based on the active data using clustering.)
assigning the shared parameter value for each cluster to each state grouped in the cluster. (On p. 1269 in the last 4 lines, the “sports ball” class is a shared parameter value. P. 1283, paragraph starting “When a recommendation” teaches handling new states based on the active data using clustering.)

	Regarding CLAIM 9, the claim is directed to a method that implements the same features as the product of claim 1 and is therefore rejected for the reasons therein. Claim 9 recites the following additional features which Shani teaches:
state information associated with recommended actions; (P. 1270, ¶ 2-3 teaches an MDP is by definition a four-tuple:                         
                            
                                
                                    S
                                    ,
                                    A
                                    ,
                                    R
                                    w
                                    d
                                    ,
                                    t
                                    r
                                
                            
                        
                    , where S is a set of states and A is a set of actions. Formally, a stationary policy for an MDP π is a mapping from states to actions, specifying which action to perform in each state. Given such an optimal policy π, at each stage of the decision process, the agent need only establish what state s it is in and execute the action a = π(s).)
use the passive data and the active data to provide updated transition probabilities between each pair of a plurality of states for each of a plurality of recommended actions (P. 1270, § 2.5, ¶ 2, lines 2-4 starting at “tr is…”; and p. 1272, §3.1, all of “The Transition Function”)

	Regarding CLAIMS 10-14, the claims are directed to methods that implement the same features as the products of claims 2 and 4-7, respectively, and are therefore rejected for the reasons therein. Examiner noted at the start of this office action that claims 12 and 13 recite features “the updated transition probabilities” and they are being interpreted as “the updated parameter value”.

	Regarding CLAIM 16, Shani teaches: A computer system comprising: one or more processors; and one or more computer-storage media storing instructions, that when used by the one or more processors, cause the one or more processors to perform operations comprising: (The experimental results in § 6 on pp. 1287-1291 are evidence of a computer system comprising processors and computer-storage media. In particular, the paragraph starting at “Next, we performed an experiment” on p. 1289 through § 6.1 on p. 1290 discloses running the MDP-based recommendation system on an online bookstore (MDP is Markov Decision Process). § 6.2 on pp. 1290-91 discloses computational analysis for model build time, recommendations per second, and memory footprints for the MDP-based system.)
updating a transition model of a reinforcement learning agent using passive data and active data to provide updated transitional probabilities, the passive data comprising sequences of user actions without recommended actions from the reinforcement learning agent, the active data comprising previous recommended actions provided by the reinforcement learning agent, (This claim mapping is split into 3 sections called passive data, active data, and transition model.
Passive data: P. 1272, § 3.1, lines 3-6 teaches: “The states in our MC model represent the relevant information that we have about the user. This information corresponds to previous choices made by users in the form of a set of ordered sequences of selections… Thus, the set of states contains all possible sequences of user selections.” The previous choices were made without recommended actions, as taught by p. 1266, in the second-to-last paragraph, the last 3 lines, and p. 1271 § 3, lines 1-4. 
Under the broadest reasonable interpretation of this limitation, obtaining passive data includes gathering sequences of user data without recommendations and obtaining an initialized predictive model of the user based on sequences of the gathered user data. On p. 1280, § 5.1.1, the paragraph starting with “To allow for” through equation 17 teaches initializing a predictive model with the parameters of various transition probabilities. On p. 1285, the bottom paragraph through equation 36 teaches initializing counts at time                         
                            t
                            =
                            0
                        
                     based on the initial transition probabilities computed on p. 1280.
Active data: P. 1279, § 5.1, ¶ 2 teaches: “The actions of the MDP correspond to a recommendation of an item.” § 5.1 teaches that when the system recommends an item                         
                            
                                
                                    x
                                
                                
                                    '
                                
                            
                        
                    , the user may accept this recommendation or select some non-recommended item                         
                            
                                
                                    x
                                
                                
                                    '
                                    '
                                
                            
                        
                    . Lastly, p. 1285, § 5.3, ¶ 2 teaches in order to re-estimate the transition function the following counts are obtained from the recently collected statistics: the number of times the r recommendation was accepted in state s and the number of times the user took item r in state s even though it was not recommended.
Transition model: P. 1270, § 2.5, ¶ 2 teaches: “An MDP is by definition a four-tuple:                         
                            
                                
                                    S
                                    ,
                                    A
                                    ,
                                    R
                                    w
                                    d
                                    ,
                                    t
                                    r
                                
                            
                        
                    , where …  tr is the state-transition function, which provides the probability of a transition between every pair of states given each action.” Updating the transition probabilities is explicitly taught on p. 1285, in the paragraph before equation 29: “We compute the new counts and the new approximation for the transition function at time t +1 based on the counts and probabilities at time t as follows:” Equations 32-33 teach computing new approximation for the transition function at time                         
                            t
                            +
                            1
                        
                     based on the counts and probabilities at time                         
                            t
                        
                    .)
the updated transition probabilities generated by applying an updated parameter value derived using the passive data and an updated parameter value derived using the active data;  (Interpreted as updating a parameter value based on passive data and active data and using it to update transition probabilities. P. 1270, § 2.5, ¶ 2 teaches: “An MDP is by definition a four-tuple:                         
                            
                                
                                    S
                                    ,
                                    A
                                    ,
                                    R
                                    w
                                    d
                                    ,
                                    t
                                    r
                                
                            
                        
                    , where …  tr is the state-transition function, which provides the probability of a transition between every pair of states given each action.” Updating the transition probabilities is explicitly taught on p. 1285, in the paragraph before equation 29: “We compute the new counts and the new approximation for the transition function at time t +1 based on the counts and probabilities at time t as follows:” Equations 32-33 teach computing new approximation for the transition function at time                         
                            t
                            +
                            1
                        
                     based on the counts and probabilities at time                         
                            t
                        
                    . The passive data includes the initialized parameters of equations 34-36 on p. 1286. In equations 29-33, updated parameter values include each “count” at time                         
                            t
                            =
                            1
                        
                     and passive data includes                         
                            
                                
                                    c
                                
                                
                                    i
                                    n
                                
                            
                        
                    ,                         
                            
                                
                                    c
                                
                                
                                    t
                                    o
                                    t
                                    a
                                    l
                                
                            
                        
                    , and                         
                            
                                
                                    c
                                
                                
                                    o
                                    u
                                    t
                                
                            
                        
                     from time                         
                            t
                            =
                            0
                        
                    . Active data includes updating the counts from the previous user session.)
generating, by the reinforcement learning agent, a recommended action based on a current state and the updated transition probabilities; and (See p. 1270, § 2.5, ¶ 2-3. The optimal policy                         
                            π
                        
                     executes the action                         
                            a
                             
                            =
                             
                            π
                            
                                
                                    s
                                
                            
                        
                     for the current state s, and the policy includes transition probabilities tr.)
providing the recommended action for presentation. (P. 1287, § 6 teaches “Users received recommendations when adding items to the shopping cart. The recommendations were based on the last k items added to the cart ordered by the time they were added. An example is shown in Figure 4 [p. 1288] where the three book covers at the bottom are the recommended items.”)

	Regarding CLAIM 17, Shani teaches: The system of claim 16, wherein the updated transition probabilities comprise probabilities between each pair of a plurality of states for each of a plurality of recommended actions. (P. 1270, § 2.5, ¶ 2, lines 2-4 starting at “tr is…”; and p. 1272, §3.1, all of “The Transition Function”)

	Regarding CLAIM 18, Shani teaches: The system of claim 16, wherein the updated parameter value is derived using a clustering algorithm. (1269, §2.4, lines 6-9; p. 1273, second paragraph starting “A second enhancement” through equation 8; p. 1283, paragraph starting “When a recommendation” teaches handling new states using clustering.)

Regarding CLAIM 19, Shani teaches: The method of claim 18, wherein the clustering algorithm comprises: determining a preliminary parameter value for each of a plurality of states based on the active data; (On p. 1269, the last 4 lines teach: “Clustering is an approach that groups some states together for purposes of predicting next states. For example, we can group items such a basketball, football, and volleyball into a ‘sports ball’ class. Such grouping helps to address the problem of data sparsity.” The items of basketball, football, and volleyball are interpreted as preliminary parameter values. P. 1283, paragraph starting “When a recommendation” teaches handling new states based on the active data using clustering.)
grouping states into one or more clusters based on the preliminary parameter value for each state; (On p. 1269 in the last 4 lines, the “sports ball” class consists of a cluster of basketball, football, and volleyball. P. 1283, paragraph starting “When a recommendation” teaches handling new states based on the active data using clustering.)
deriving a shared parameter value for each cluster from the preliminary parameter value for each state in the cluster; and (On p. 1269 in the last 4 lines, the “sports ball” class is a shared parameter value. P. 1283, paragraph starting “When a recommendation” teaches handling new states based on the active data using clustering.)
assigning the shared parameter value for each cluster to each state grouped in the cluster. (On p. 1269 in the last 4 lines, the “sports ball” class is a shared parameter value. P. 1283, paragraph starting “When a recommendation” teaches handling new states based on the active data using clustering.)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 8, 15, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Shani et al. (“An MDP-Based Recommender System”, cited in the IDS filed 04/23/2018) in view of Tang et al.  (US 20110043437 A1, cited in the PTO-892 filed 05/24/2022). 


Regarding CLAIM 8, Shani teaches: The one or more computer storage media of claim 7, wherein the states are grouped into the one or more clusters  for the shared parameter values for the clusters. (On p. 1269 in the last 4 lines, the “sports ball” class consists of a cluster of basketball, football, and volleyball. P. 1283, paragraph starting “When a recommendation” teaches handling new states based on the active data using clustering.)
However, Shani does not explicitly teach: based on confidence values
But Tang teaches: group into clusters based on confidence values (¶ [0046], lines 1-2 teaches “FIG. 6A illustrates an untagged cluster 602 being compared to each tagged cluster 604, 606, 608.” Lines 9 to the end of the paragraph teach determining a confidence value for suggested tagging data for a cluster. All of ¶ [0047] further tach comparing clusters to a confidence level.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated Tang’s system of comparing a cluster to a confidence value. A motivation for the combination is to determine whether to display Shani’s book recommendations to a user by comparing Shani’s states to Tang’s confidence value. (Tang, ¶ [0047], lines 18-20: “The confidence level is used to decide whether clusters and their suggested tagging data are dis played on a user interface.”)

Regarding CLAIM 15, the claim is directed to a method that implements the same features as the product of claim 8, respectively and is therefore rejected for the reasons therein.

Regarding CLAIM 20, Shani teaches: The system of claim 19, wherein the states are grouped into the one or more clusters  for the shared parameter values for the clusters. (On p. 1269 in the last 4 lines, the “sports ball” class consists of a cluster of basketball, football, and volleyball. P. 1283, paragraph starting “When a recommendation” teaches handling new states based on the active data using clustering.)
However, Shani does not explicitly teach: based on confidence values
But Tang teaches: group into clusters based on confidence values (¶ [0046], lines 1-2 teaches “FIG. 6A illustrates an untagged cluster 602 being compared to each tagged cluster 604, 606, 608.” Lines 9 to the end of the paragraph teach determining a confidence value for suggested tagging data for a cluster. All of ¶ [0047] further tach comparing clusters to a confidence level.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated Tang’s system of comparing a cluster to a confidence value. A motivation for the combination is to determine whether to display Shani’s book recommendations to a user by comparing Shani’s states to Tang’s confidence value. (Tang, ¶ [0047], lines 18-20: “The confidence level is used to decide whether clusters and their suggested tagging data are dis played on a user interface.”)

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1-3, 5-9, and 16 are rejected on the ground of nonstatutory double patenting as being unpatentable over respective claims 1, 7-8, 10-14, and 18 of U.S. Patent No. 11,429,892, hereinafter the ’892 patent, in view of Shani et al. (“An MDP-Based Recommender System”, cited in the IDS filed 04/23/2018). 

Instant Claim 1
Claim 1 of ’892
One or more computer storage media storing computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform operations comprising:
One or more computer storage media storing computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform operations comprising: 
obtaining passive data comprising sequences of user actions without recommended actions from a reinforcement learning agent;
receiving passive data that encodes sequences of a plurality of states of an environment resulting from user actions browsing sequences of content without recommendations; 
obtaining active data including previous recommended actions provided by the reinforcement learning agent; 
receiving currently available active data that encodes a state transition from a current state to a new state of the environment resulting from a user action taken in response to being provided a recommendation associated with the content and determined by a recommendation policy based on the current state and a reward signal indicating a benefit associated with taking the recommendation; and 
updating a parameter value used by the reinforcement learning agent based on the active data;

updating a transition model of the reinforcement learning agent by generating updated transition probabilities by applying the updated parameter value to the passive data; and

generating, by the reinforcement learning agent, a recommended action based on a current state and the updated transition probabilities; and
training a recommendation system by iteratively updating a transition model to include a plurality of transition probabilities between pairs of states of the environment that are generated by using a combination of the passive data and the currently available active data, iteratively updating the recommendation policy based on the iteratively updated transition model and the combination of the passive data and the currently available active data, 

Shani et al.
providing the recommended action for presentation.
P. 1287, § 6, ¶ 3 teaches: “Users received recommendations when adding items to the shopping cart. The recommendations were based on the last k items added to the cart ordered by the time they were added. An example is shown in Figure 4 [p. 1288] where the three book covers at the bottom are the recommended items.”


Although instant Claim 1 does not explicitly recite training a recommendation system,  it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to update parameter values during training. The feature in instant claim 1 of updating parameter values used to update models of a reinforcement learning agent performs essentially the same function as training a recommendation system.
Claim 1 of the ‘892 patent does not recite a feature of providing the recommended action for presentation, but Shani et al. teaches this feature as indicated in the above table comparing claim features. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have used Shani’s system to present the recommendations generated by the ‘892 patent. A motivation for the combination is to present recommended items to shoppers on a website. See Shani p. 1287, from the start of § 6 to § 6.1.

Instant Claim 2
Claim 7 of ’892
The one or more computer storage media of claim 1, wherein the reinforcement learning agent is part of a Markov decision process-based recommendation system.  
The one or more computer storage media of claim 1, wherein the recommendation system comprises a Markov decision process-based recommendation system.


Instant Claim 3
Claim 8 of ’892
The one or more computer storage media of claim 1, wherein the updated transition probabilities comprise probabilities between each pair of a plurality of states for each of a plurality of recommended actions.  
The one or more computer storage media of claim 1, wherein the plurality of transition probabilities that are included in the transition model comprise probabilities between each pair of the plurality of states of the environment for the recommendation.


Instant Claim 5
Claim 10 of ’892
The one or more computer storage media of claim 1, wherein the updated parameter value is derived from the active data using an n-gram model
The one or more computer storage media of claim 1, wherein the transition model includes at least one parameter value that is derived from the currently available active data using an n-gram model.



Instant Claim 6
Claim 11 of ’892
The one or more computer storage media of claim 1, wherein the updated parameter value is derived using a clustering algorithm.  
The one or more computer storage media of claim 1, wherein the transition model includes at least one parameter value that is derived from a clustering algorithm.



Instant Claim 7
Claim 12 of ’892
The one or more computer storage media of claim 6, wherein the clustering algorithm comprises: 
The one or more computer storage media of claim 11, wherein the clustering algorithm comprises: 
determining a preliminary parameter value for each of a plurality of states based on the active data; 
determining a preliminary parameter value for each of the plurality of states based on the currently available active data; 
grouping states into one or more clusters based on the preliminary parameter value for each state; 
grouping the plurality of states into one or more clusters based on the preliminary parameter value for each state; 
deriving a shared parameter value for each cluster based on the preliminary parameter value for each state in the cluster; and 
deriving a shared parameter value for each cluster based on the preliminary parameter value for each state in the cluster; and 
assigning the shared parameter value for each cluster to each state grouped in the cluster.  
assigning the shared parameter value for each cluster to each state grouped in the cluster.



Instant Claim 8
Claim 13 of ’892
The one or more computer storage media of claim 7, wherein the states are grouped into the one or more clusters based on confidence values for the shared parameter values for the clusters.
The one or more computer storage media of claim 12, wherein the states are grouped into the one or more clusters based on confidence values for the shared parameter values for the clusters.



Instant Claim 9 is directed to a method that essentially implements the same features as the product of instant claim 1. Claim 14 of the ’892 patent is a method that implements the same features as the product of claim 1 of the ’892 patent. Therefore, instant claim 9 is rejected as being unpatentable over claim 14 of the ’892 patent in view of Shani.

Instant Claim 16 is directed to a system that essentially implements the same features as the product of instant claim 1. Claim 18 of the ’892 patent is a system that implements the same features as the product of claim 1 of the ’892 patent. Therefore, instant claim 16 is rejected as being unpatentable over claim 18 of the ’892 patent in view of Shani.

Response to Arguments
Applicant's arguments have been fully considered but they are not persuasive.

Double Patenting (Remarks p. 10): The amendments do not overcome the rejections of instant claims 1-3, 5-9, and 16 on the ground of nonstatutory double patenting over respective claims 1, 7-8, 10-14, and 18 of U.S. Patent No. 11,429,892 in view of the Shani et al. (“An MDP-Based Recommender System”, cited in the IDS filed 04/23/2018).

Rejections Under 35 U.S.C. 101 (Remarks pp. 10-11): 
Applicant’s argument #1: “[T]he claims recite features regarding employing a reinforcement learning agent (i.e., a machine learning model) to learn from both passive data and active date to generate a recommended action. As discussed during the interview, this cannot be practically performed in the human mind or even by pen and paper. Additionally, this not simply a mathematical calculation. As such, the claims are not directed to an abstract idea. Withdrawal of the 35 U.S.C. § 101 rejection and allowance of the claims is respectfully requested.”

Examiner’s response #1: Examiner respectfully disagrees. The features upon which applicant relies (i.e., “employing a reinforcement learning agent (i.e., a machine learning model) to learn from both passive data and active date to generate a recommended action”) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). Claim 1 recites the features of:
updating a parameter value used by the reinforcement learning agent based on the active data; 
updating a transition model of the reinforcement learning agent by generating updated transition probabilities by applying the updated parameter value to the passive data; and 
generating, by the reinforcement learning agent, a recommended action based on a current state and the updated transition probabilities; and 
Each of these limitations is a mental process which can reasonably be performed in one’s mind with pencil and paper and a mathematical concept. Accordingly, claim 1 recites a judicial exception. The additional elements of claim 1 do not integrate the judicial exceptions into a practical application.
Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Claim 1 is not patent eligible. The rejections of claims 1-20 under 35 U.S.C. 101 are maintained.

Rejections Under 35 U.S.C. 102 and 103 (Remarks pp. 11-13)
Applicant’s argument #2: “Shani generally discusses a Markov decision process-based recommender system. Section 5.1.1 (i.e., p. 1280-1281) of Shani discuss using parameter values (e.g.,             
                α
                ,
                β
            
        ) to derive transition probabilities from passive data. However, the parameter values are not determined from active data. Instead, Shani indicates that the parameter values are constants that can vary for items based on popularity (i.e., lift). Additionally, Sharma explicitly indicates that the parameter values are only used for initializing the transition probabilities. In particular, when discussing updating the model online in section 5.3 (p. 1285), Shani states: ‘Note that at this stage the constants             
                
                    
                        α
                    
                    
                        s
                        ,
                        r
                    
                
                ,
                
                    
                        β
                    
                    
                        s
                        ,
                        r
                    
                
            
         no longer play a role-they were used only to generate the initial model.’ Thus, Shani clearly fails to discuss determining updating a parameter value using active data and updating a transition probability by applying the updated parameter value derived from active data to passive data when training as recited by claim 1. 
“As such, it is respectfully submitted that Shani fails to describe, either expressly or inherently, each and every element of independent claim 1 and, as such, claim 1 is not anticipated by Shani. Independent claims 9 and 16 recite at least some similar features and are not anticipated by Shani for at least reasons similar to those provided above for claim 1. Accordingly, Applicant respectfully requests withdrawal of the rejection of claims 1, 9, and 16 under 35 U.S.C. § 102(a)(1). Claims 1, 9, and 16 are believed to be in condition for allowance and such favorable action is respectfully requested.”

Examiner’s response #2: Examiner respectfully disagrees with Applicant’s argument that “Shani fails to describe, either expressly or inherently, each and every element of independent claim 1 and, as such, claim 1 is not anticipated by Shani.” The rejection of claim 1 does not rely upon Shani’s constants             
                
                    
                        α
                    
                    
                        s
                        ,
                        r
                    
                
                ,
                
                    
                        β
                    
                    
                        s
                        ,
                        r
                    
                
            
         from p. 1280, § 5.1.1 explicitly to teach the parameter values. Instead, Examiner broadly interprets the parameter values to mean the variable             
                
                    
                        c
                    
                    
                        i
                        n
                    
                
                
                    
                        s
                        ,
                        r
                        ,
                        s
                        ∙
                        r
                    
                
            
        ,             
                
                    
                        c
                    
                    
                        o
                        u
                        t
                    
                
                
                    
                        s
                        ,
                        r
                        ,
                        s
                        ∙
                        r
                    
                
            
        ,             
                
                    
                        c
                    
                    
                        t
                        o
                        t
                        a
                        l
                    
                
                
                    
                        s
                        ,
                        s
                        ∙
                        r
                    
                
            
        ,             
                c
                o
                u
                n
                t
                
                    
                        s
                        ,
                        r
                        ,
                        s
                        ∙
                        r
                    
                
            
        , and             
                c
                o
                u
                n
                t
                
                    
                        s
                        ,
                        s
                        ∙
                        r
                    
                
            
         from p. 1285, § 5.3. The rejection of independent claim 1 under 35 U.S.C. 102 in the present office action explains how Shani teaches each and every limitation of independent claim 1. 
Additionally, In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e., “when training”) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). The rejections of claims 1-7, 9-14, and 16-19 are maintained. The rejections of claims 8, 15, and 20 under 35 U.S.C. 103 are maintained.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Lu et al. (“Partially Observable Markov Decision Process for Recommender Systems”) teaches a neural-optimized Partially Observable Markov Decision Process algorithm for recommender systems.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Asher H. Jablon whose telephone number is (571)270-7648. The examiner can normally be reached Monday - Friday, 9:00 am - 6:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Al Kawsar can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/A.H.J./Examiner, Art Unit 2127                                                                                                                                                                                                        

/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127