DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12/09/2020 has been entered.
 
Status of Claims
The present application is being examined under the claims filed on 06/01/2021.
Claims 1 and 6 are canceled.
Claims 2-5 and 7-11 are rejected.
Claims 2-5 and 7-11 are pending.

Drawings
The Drawings filed on 12/28/2016 are acceptable for examination purposes.

Specification
The Specification filed on 12/28/2016 are acceptable for examination purposes.

Response to Arguments
In reference to claim rejections under 35 USC § 103
Applicant asserts that Luo and Amendjian do not disclose “the reward comprises a searching duration”.
Examiner respectfully disagrees. The claim recites a Markov Decision Process (MDP) model wherein the parameters of the MDP model comprise a state, an action, and a reward. Examiner notes that in reinforcement learning the action, state, and reward are well-known and well-understood to be interconnected with each other. Here is a brief explanation of how reinforcement learning works: you have an environment (the search space), an action (which represent an action taken at a particular step/iteration of the reinforcement learning model), a reward for the action taken, and the state represents the current state of the environment given the current action and reward.
The claim further recites the limitation in question “the reward comprises a searching duration”. Applicant is correct in asserting that the reward is a value (the reward in reinforcement learning is always a value). Examiner notes that the broadest reasonable interpretation of “the reward comprises a searching duration” is that the reward is such that it is related to the searching duration. This means that the reward would have some impact on the amount of search time in a given environment.
Examiner notes that when an action is taken, regardless of whether it is the highest-expected-reward action or a random action, the reward to taking that action comprises an exploration duration (search duration). As an example, if I am exploring a space (performing a search) I can perform the highest-expected-reward action and complete my task with an expected “search duration” or I can perform a random action which will extend or shorten my “search duration”. Examiner notes that Amendjian ¶ [0102] “the training may choose the highest-expected-reward action 90% of the time and a random action 10% of the time, to encourage exploration of the search space”.
encourage exploration of the search space”. Meaning that the rewards comprises the encouragement of exploration of the search space. In other words the reward comprises a searching duration.
Applicant's arguments filed 06/01/2021 have been fully considered but they are not persuasive.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2-5 and 7-11 are rejected under 35 U.S.C. 103 as being unpatentable over Luo et al. (hereinafter Luo) “Win-Win Search: Dual-Agent Stochastic Game in Session Search” in view of Amendjian et al. (hereinafter Amendjian) US 20170032417 A1.
In reference to claim 2. Luo teaches a searching method based on artificial intelligence, comprising:
“obtaining a query” (Luo in at least § 3 to § 4.5 “The user agent writes a query”);
“obtaining a first search result corresponding to the query according to a Markov Decision Process MDP model” (Luo in at least § 3 to § 4.5 “Search engine action                                 
                                    
                                        
                                            a
                                        
                                        
                                            s
                                            e
                                        
                                        
                                            t
                                        
                                    
                                
                             results in a set of documents Dt, which are returned as message                                 
                                    
                                        
                                            ∑
                                            
                                                s
                                                e
                                            
                                            
                                                t
                                            
                                        
                                        
                                             
                                        
                                    
                                
                            sent from the search engine agent to the user agent through the world” and “Markov Decision Process provides the basics for the win-win search framework”);
“displaying the first search result” (Luo in at least § 3 to § 4.5, and § 6 “Search engine action                                 
                                    
                                        
                                            a
                                        
                                        
                                            s
                                            e
                                        
                                        
                                            t
                                        
                                    
                                
                             results in a set of documents Dt, which are returned as message                                 
                                    
                                        
                                            ∑
                                            
                                                s
                                                e
                                            
                                            
                                                t
                                            
                                        
                                        
                                             
                                        
                                    
                                
                            sent from the search engine agent to the user agent through the world”);
“obtaining a reward for the first search result from a user so as to obtain a second search result according to the MDP model, and displaying the second search result” (Luo in at least § 3 to § 4.5, and § 6 “The reward function R is defined over B x A → R. It is the amount of document relevance that an agent obtains from the world”, “The search engine runs its optimization algorithm and picks the best policy πse, which maximizes the joint long term rewards for both agents” and “Search engine action                                 
                                    
                                        
                                            a
                                        
                                        
                                            s
                                            e
                                        
                                        
                                            t
                                        
                                    
                                
                             results in a set of documents Dt, which are returned as message                                 
                                    
                                        
                                            ∑
                                            
                                                s
                                                e
                                            
                                            
                                                t
                                            
                                        
                                        
                                             
                                        
                                    
                                
                            sent from the search engine agent to the user agent through the world”);
wherein, parameters of the MDP model comprise:
“a state, represented by the query and a context” (Luo in at least § 3 to § 4.5. See at least § 4.2. § 4.1 clearly discloses a state represented by the query and browsing record. As per ¶ [0028] of the Instant Specification, the context includes but not limited to action of use or browsing record);
“an action, represented by the first search result” (Luo in at least § 3 to § 4.5. See at least § 4.3. § 4.1 clearly discloses the search engine action represented by the search results);
“a reward, represented by the reward for the first search result” (Luo in at least § 3 to § 4.5. See at least § 4.5. § 4.1 clearly discloses the optimization algorithm which picks the best policy to maximize the long term rewards);

Luo does not explicitly disclose:
“wherein the reward comprises a searching duration, wherein the searching duration indicates a staying duration of a user in a search process, the searching duration is regarded as an optimization objective by regarding the searching duration as the reward, such that the user can stay longer in the next search process”.
However, Amendjian discloses:
“wherein the reward comprises a searching duration, wherein the searching duration indicates a staying duration of a user in a search process, the searching duration is regarded as an optimization objective by regarding the searching duration as the reward, such that the user can stay longer in the next search process” (Amendjian ¶ [0102] “the training may choose the highest-expected-reward action 90% of the time and a random action 10% of the time, to encourage exploration of the search space”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Luo and Amendjian. Luo teaches a Markov Decision Process model dynamics in session search, including decision states, query changes, clicks, and rewards, as a 

In reference to claim 3. Luo teaches the searching method according to claim 2 (as mentioned above), wherein, the query comprises:
Luo further discloses:
“a query inputted by the user initially, a query recommended to the user, or a switched query inputted by the user” (Luo in at least § 3 to § 4.5 “The user agent writes a query”).

In reference to claim 4. Luo teaches the searching method according to claim 2 (as mentioned above), wherein, the first search result comprises:
Luo further discloses:
“a webpage result, and a query recommended to the user” (Luo in at least § 2.1, and § 3 to § 4.5 “Search engine action                                 
                                    
                                        
                                            a
                                        
                                        
                                            s
                                            e
                                        
                                        
                                            t
                                        
                                    
                                
                             results in a set of documents Dt, which are returned as message                                 
                                    
                                        
                                            ∑
                                            
                                                s
                                                e
                                            
                                            
                                                t
                                            
                                        
                                        
                                             
                                        
                                    
                                
                            sent from the search engine agent to the user agent through the world” and “optimal rare query suggestion […] with implicit feedback in logs”).

In reference to claim 5. Luo teaches the searching method according to claim 4 (as mentioned above), wherein, the reward comprises one or more of:
Luo further discloses:
“clicking the webpage result by the user; clicking the query recommended to the user by the user; the switched query inputted by the user; a clicking and buying action of the user; and a clicking duration” (Luo in at least § 3 to § 5 “The reward function R is defined over B x A → R. It is the amount of document relevance that an agent obtains from the world”, “The search engine runs its optimization algorithm and picks the best policy πse, which maximizes the joint long term rewards for both agents” and “Search engine action                                 
                                    
                                        
                                            a
                                        
                                        
                                            s
                                            e
                                        
                                        
                                            t
                                        
                                    
                                
                             results in a set of documents Dt, which are returned as message                                 
                                    
                                        
                                            ∑
                                            
                                                s
                                                e
                                            
                                            
                                                t
                                            
                                        
                                        
                                             
                                        
                                    
                                
                            sent from the search engine agent to the user agent through the world”. See at least Fig. 2 and the Dual-agent stochastic game steps 1-9 in § 4.1).

In reference to claim 7. Luo teaches a searching device based on artificial intelligence, comprising:
“obtaining a query” (Luo in at least § 3 to § 4.5 “The user agent writes a query”);;
“obtaining a first search result corresponding to the query according to a Markov Decision Process MDP model” (Luo in at least § 3 to § 4.5 “Search engine action                                 
                                    
                                        
                                            a
                                        
                                        
                                            s
                                            e
                                        
                                        
                                            t
                                        
                                    
                                
                             results in a set of documents Dt, which are returned as message                                 
                                    
                                        
                                            ∑
                                            
                                                s
                                                e
                                            
                                            
                                                t
                                            
                                        
                                        
                                             
                                        
                                    
                                
                            sent from the search engine agent to the user agent through the world” and “Markov Decision Process provides the basics for the win-win search framework”);
“displaying the first search result” (Luo in at least § 3 to § 4.5, and § 6 “Search engine action                                 
                                    
                                        
                                            a
                                        
                                        
                                            s
                                            e
                                        
                                        
                                            t
                                        
                                    
                                
                             results in a set of documents Dt, which are returned as message                                 
                                    
                                        
                                            ∑
                                            
                                                s
                                                e
                                            
                                            
                                                t
                                            
                                        
                                        
                                             
                                        
                                    
                                
                            sent from the search engine agent to the user agent through the world”);
“obtaining a reward for the first search result from a user so as to obtain a second search result according to the MDP model, and displaying the second search result” (Luo in at least § 3 to § 4.5, and § 6 “The reward function R is defined over B x A → R. It is the amount of document relevance that an agent obtains from the world”, “The search engine runs its optimization algorithm and picks the best policy πse, which maximizes the joint long term rewards for both agents” and “Search engine action                                 
                                    
                                        
                                            a
                                        
                                        
                                            s
                                            e
                                        
                                        
                                            t
                                        
                                    
                                
                             results in a set of documents Dt, which are returned as message                                 
                                    
                                        
                                            ∑
                                            
                                                s
                                                e
                                            
                                            
                                                t
                                            
                                        
                                        
                                             
                                        
                                    
                                
                            sent from the search engine agent to the user agent through the world”);
wherein, parameters of the MDP model comprise:
“a state, represented by the query and a context” (Luo in at least § 3 to § 4.5. See at least § 4.2. § 4.1 clearly discloses a state represented by the query and browsing record. As per ¶ [0028] of the Instant Specification, the context includes but not limited to action of use or browsing record);
“an action, represented by the first search result” (Luo in at least § 3 to § 4.5. See at least § 4.3. § 4.1 clearly discloses the search engine action represented by the search results);
“a reward, represented by the reward for the first search result” (Luo in at least § 3 to § 4.5. See at least § 4.5. § 4.1 clearly discloses the optimization algorithm which picks the best policy to maximize the long term rewards);

Luo does not explicitly disclose:
“one or more computing devices” configured to execute:
“wherein the reward comprises a searching duration, wherein the searching duration indicates a staying duration of a user in a search process, the searching 
However, Amendjian discloses:
“one or more computing devices” (Amendjian in at least ¶ [0114] and ¶ [0115]) configured to execute:
“wherein the reward comprises a searching duration, wherein the searching duration indicates a staying duration of a user in a search process, the searching duration is regarded as an optimization objective by regarding the searching duration as the reward, such that the user can stay longer in the next search process” (Amendjian ¶ [0102] “the training may choose the highest-expected-reward action 90% of the time and a random action 10% of the time, to encourage exploration of the search space”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Luo and Amendjian. Luo teaches a Markov Decision Process model dynamics in session search, including decision states, query changes, clicks, and rewards, as a cooperative game between the user and the search engine (reinforcement learning). Amendjian teaches a system of detecting and generating online behavior from a clickstream, which uses Markov models or reinforcement learning. One of ordinary skill would have motivation to combine Luo and Amendjian because MPEP 2143 sets forth the Supreme Court rationales for obviousness including: (D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results; (E) "Obvious to try" choosing from a finite number of identified, predictable solutions, with a reasonable expectation of success; (F) Known work in one field of endeavor may prompt variations of it 

In reference to claim 8. Luo and Amendjian teaches the searching device according to claim 7 (as mentioned above), wherein, the query comprises:
Luo further discloses:
“a query inputted by the user initially, a query recommended to the user, or a switched query inputted by the user” (Luo in at least § 3 to § 4.5 “The user agent writes a query”).

In reference to claim 9. Luo and Amendjian teaches the searching device according to claim 7 (as mentioned above), wherein, the first search result comprises:
Luo further discloses:
“a webpage result, and a query recommended to the user” (Luo in at least § 2.1, and § 3 to § 4.5 “Search engine action                                 
                                    
                                        
                                            a
                                        
                                        
                                            s
                                            e
                                        
                                        
                                            t
                                        
                                    
                                
                             results in a set of documents Dt, which are returned as message                                 
                                    
                                        
                                            ∑
                                            
                                                s
                                                e
                                            
                                            
                                                t
                                            
                                        
                                        
                                             
                                        
                                    
                                
                            sent from the search engine agent to the user agent through the world” and “optimal rare query suggestion […] with implicit feedback in logs”).

In reference to claim 10. Luo and Amendjian teaches the searching device according to claim 9 (as mentioned above), wherein, the reward comprises one or more of:
Luo further discloses:
“clicking the webpage result by the user; clicking the query recommended to the user by the user; the switched query inputted by the user; a clicking and buying action of the user; and a clicking duration” (Luo in at least § 3 to § 5 “The reward function R is defined over B x A → R. It is the amount of document relevance that an agent obtains from the world”, “The search se, which maximizes the joint long term rewards for both agents” and “Search engine action                                 
                                    
                                        
                                            a
                                        
                                        
                                            s
                                            e
                                        
                                        
                                            t
                                        
                                    
                                
                             results in a set of documents Dt, which are returned as message                                 
                                    
                                        
                                            ∑
                                            
                                                s
                                                e
                                            
                                            
                                                t
                                            
                                        
                                        
                                             
                                        
                                    
                                
                            sent from the search engine agent to the user agent through the world”. See at least Fig. 2 and the Dual-agent stochastic game steps 1-9 in § 4.1).

In reference to claim 11. Luo teaches […], the searching method comprising:
“obtaining a query” (Luo in at least § 3 to § 4.5 “The user agent writes a query”);;
“obtaining a first search result corresponding to the query according to a Markov Decision Process MDP model” (Luo in at least § 3 to § 4.5 “Search engine action                                 
                                    
                                        
                                            a
                                        
                                        
                                            s
                                            e
                                        
                                        
                                            t
                                        
                                    
                                
                             results in a set of documents Dt, which are returned as message                                 
                                    
                                        
                                            ∑
                                            
                                                s
                                                e
                                            
                                            
                                                t
                                            
                                        
                                        
                                             
                                        
                                    
                                
                            sent from the search engine agent to the user agent through the world” and “Markov Decision Process provides the basics for the win-win search framework”);
“displaying the first search result” (Luo in at least § 3 to § 4.5, and § 6 “Search engine action                                 
                                    
                                        
                                            a
                                        
                                        
                                            s
                                            e
                                        
                                        
                                            t
                                        
                                    
                                
                             results in a set of documents Dt, which are returned as message                                 
                                    
                                        
                                            ∑
                                            
                                                s
                                                e
                                            
                                            
                                                t
                                            
                                        
                                        
                                             
                                        
                                    
                                
                            sent from the search engine agent to the user agent through the world”);
“obtaining a reward for the first search result from a user so as to obtain a second search result according to the MDP model, and displaying the second search result” (Luo in at least § 3 to § 4.5, and § 6 “The reward function R is defined over B x A → R. It is the amount of document relevance that an agent obtains from the world”, “The search engine runs its optimization algorithm and picks the best policy πse, which maximizes the joint long term rewards for both agents” and “Search engine action                                 
                                    
                                        
                                            a
                                        
                                        
                                            s
                                            e
                                        
                                        
                                            t
                                        
                                    
                                
                             results in a set of documents Dt, which are returned as message                                 
                                    
                                        
                                            ∑
                                            
                                                s
                                                e
                                            
                                            
                                                t
                                            
                                        
                                        
                                             
                                        
                                    
                                
                            sent from the search engine agent to the user agent through the world”);
wherein, parameters of the MDP model comprise:
“a state, represented by the query and a context” (Luo in at least § 3 to § 4.5. See at least § 4.2. § 4.1 clearly discloses a state represented by the query and browsing record. As per ¶ [0028] of the Instant Specification, the context includes but not limited to action of use or browsing record);
“an action, represented by the first search result” (Luo in at least § 3 to § 4.5. See at least § 4.3. § 4.1 clearly discloses the search engine action represented by the search results);
“a reward, represented by the reward for the first search result” (Luo in at least § 3 to § 4.5. See at least § 4.5. § 4.1 clearly discloses the optimization algorithm which picks the best policy to maximize the long term rewards);

Luo does not explicitly disclose:
“a non-transitory computer readable storage medium having stored therein instructions that, when executed by a processor of a terminal, causes the terminal to perform a searching method based on artificial intelligence”:
“wherein the reward comprises a searching duration, wherein the searching duration indicates a staying duration of a user in a search process, the searching duration is regarded as an optimization objective by regarding the searching duration as the reward, such that the user can stay longer in the next search process”.
However, Amendjian discloses:
“a non-transitory computer readable storage medium having stored therein instructions that, when executed by a processor of a terminal, causes the terminal to perform a 
“wherein the reward comprises a searching duration, wherein the searching duration indicates a staying duration of a user in a search process, the searching duration is regarded as an optimization objective by regarding the searching duration as the reward, such that the user can stay longer in the next search process” (Amendjian ¶ [0102] “the training may choose the highest-expected-reward action 90% of the time and a random action 10% of the time, to encourage exploration of the search space”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Luo and Amendjian. Luo teaches a Markov Decision Process model dynamics in session search, including decision states, query changes, clicks, and rewards, as a cooperative game between the user and the search engine (reinforcement learning). Amendjian teaches a system of detecting and generating online behavior from a clickstream, which uses Markov models or reinforcement learning. One of ordinary skill would have motivation to combine Luo and Amendjian because MPEP 2143 sets forth the Supreme Court rationales for obviousness including: (D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results; (E) "Obvious to try" choosing from a finite number of identified, predictable solutions, with a reasonable expectation of success; (F) Known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of ordinary skill in the art.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Viker A. Lamardo whose telephone number is (571)270-5871.  The examiner can normally be reached on Mon. - Fri. 9 AM - 5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann J. Lo can be reached on (571)272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact 





/VIKER A LAMARDO/Primary Examiner, Art Unit 2126