DETAILED ACTION
1.	 Claims 1-20 are pending in this application.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. § 102 and § 103 (or as subject to pre-AIA  35 U.S.C. § 102 and § 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

Response to Amendment
3.	In the amendment filed on 03/29/2022, claims 1, 8, and 16-20 have been amended. Claims 2-7, and 9-15 have been kept original. The currently pending claims considered below are Claims 1-20.

Claim Rejections - 35 USC § 103
4. 	In the event the determination of the status of the application as subject to AIA  35 U.S.C. § 102 and § 103 (or as subject to pre-AIA  35 U.S.C. § 102 and § 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
 The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section § 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under pre-AIA  35 U.S.C. § 103(a) are summarized as follows:

1.    Determining the scope and contents of the prior art.
2.    Ascertaining the differences between the prior art and the claims at issue.
3.    Resolving the level of ordinary skill in the pertinent art.
4.    Considering objective evidence present in the application indicating obviousness or nonobviousness.


5.	Claims 1, 8, and 16 are rejected under 35 U.S.C. § 103 as being unpatentable over Cheng et al. (US 9917904 B1) in view of Wang et al. (US 6766320 B1).
	
As per claim 1, Cheng teaches a method, comprising (Cheng, fig. 5, Column 2, Lines 13-14, “a computer-implemented method for invoking a non-search action based on a search query.”): 
extracting digital data from a session that includes user interaction with a user interface of a search engine (Cheng, fig. 1D, Column 7, Lines 10-59, “The system may then extract information from the page at the URL that is believed to be highly relevant, such as headings, page titles, and images, and may present the extracted information to the user. The user may then select certain of the items from the page to add to his or her post.” Wherein the information from the page at the URL is interpreted as the digital data from the session. The user may then select certain of the items from the page to add to his or her post is interpreted as the user interaction with the user interface. The interfaces illustrated in fig. 1A-D are interpreted as user interfaces. The search engine is inherent to be google search engine); 
the digital data comprises search query data (Cheng, fig. 1D:136. Column 7, Lines 22-27, “FIG. 1D shows the results page delivered to the user when the user submits the query. Here, the text of the query with the reserved character z removed, has been presented in a search edit input field 136 so that the user can choose to submit the text (either with or without further editing) simply as a search.” Where the text of the query is interpreted as the search query data), 
a sequence of user state data (Cheng, fig. 1D:136. Column 7, Lines 31-32, “a communication element 132 is displayed above a list of search results 134.” Wherein the list of search results is interpreted as the sequence of user state data. Further, Column 9, Lines 20-35, “For example, in a verbose mode (intended for spoken input) with results page confirmation, the syntax can take the form [verbose action trigger]+[action arguments and payload].” Wherein the verbose mode made of a sequence of user state data such as [verbose action trigger]+[action arguments and payload]), and 
user feedback data (Cheng, Column 27, Lines 47-51, “feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.” Where the visual feedback, auditory feedback, or tactile feedback are interpreted as the user feedback data); 
the search query data is for a search query received via an input device during the session (Cheng, Column 7, Lines 10-27, “FIGS. 1C and 1D are partial screen shots of a computer receiving messaging input in a search input field.” Where the computer is interpreted as the input device during the session. The computer is inherent to be any type of computer mobile or not. Further, “FIG. 1C shows the familiar GOOGLE search home page, where a user has entered the query "z loved avatar special effects" into the search input field 130-just as in the mobile computing example above.” Wherein the query "z loved avatar special effects" is interpreted as the search query); 
the sequence of user state data represents user interactions that occur before a response to the search query is presented (Cheng, Column 9, Lines 20-35, “In such a situation, the user's device or a remote search service may initially determine from the user's voice commands whether the user intends to conduct a traditional search, or instead to perform a non-search action.” Where the user's voice commands are interpreted as the sequence of user state data represents user interactions that occur before the response to the search query is presented once occurs before the determination of what is the user intends); 
the user feedback data comprises at least one response to at least one previous presentation of navigation elements by the user interface (Cheng, fig. 2A, Column 10, Lines 35-40, “Thus, the various types of feedback may be made available to each user conveniently in one place. Users may also see posts related to other users my visiting profile pages for those other users, and may also go to their own profile pages or to their stream pages to see all of the posts and comments for posts to which they are subscribed” Wherein the user can see/navigate to his own post or other users posts. The feedback is inherent to be made in post that was already posted/available to be viewed. The posts are interpreted as at least one previous presentation of navigation elements by the user interface, and the comments are interpreted as the least one response to the post); 
in response to the search query, outputting the selected at least one navigation element for presentation via an output device operably coupled to the input device (Cheng, fig. 1D:134, Column, 7, Lines 31-32, “a communication element 132 is displayed above a list of search results 134.” Where the displayed above a list of search results is interpreted as the outputting the selected at least one navigation element for presentation via an output device operably coupled to the input device. Further, Column 27, 40-51, “To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device ( e.g., a CRT ( cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer.”);
 wherein the method is performed by at least one computing device (Cheng, fig. 6:622, Column 25, Lines 33-34, “a personal computer such as a laptop computer 622.” Where the personal computer and the laptop computer are computing device).
However, it is noted that the prior art of Cheng does not explicitly teach “inputting the digital data into at least one reinforcement learning model trained using population state data; the population state data indicates sequences of states of a population of users after presentations of computer-generated navigation elements to the population of users in response to search queries received from the population of users during sessions of the population of users; computing, by the at least one reinforcement learning model, at least two reward scores; using the user state data for a plurality of computer-generated navigation element options; using the at least two reward scores, selecting, by the at least one reinforcement learning model, at least one navigation element of the plurality of computer-generated navigation element options;”
On the other hand, in the same field of endeavor, Wang teaches inputting the digital data into at least one reinforcement learning model trained using population state data (Wang, Column 13, Lines 5-64, “Training Robust Parser Using Query Log Files” Where “A standard gradient descent method is used to train the perceptron, such as that described in S. Russell, P. Norvig, 60 "Artificial Intelligence", Prentice-Hall, Inc. 1995, pp573-577. The training data is the user query log file where the sentences are classified as positive and negative examples.” The standard gradient descent method is interpreted as the at least one reinforcement learning model trained using population state data. The sentences are interpreted as the population state data);
the population state data indicates sequences of states of a population of users (Wang, Column 10, Lines 65-67, “Consider the above rule for the sentence … (“How lo gel from Beijing to Shanghai?").” Where the How lo gel from Beijing to Shanghai is a sentence that is interpreted as the population state data indicates sequences of states of the population of users. Further, Column 1, Lines 16-18, “Today's popular search engines, such as "Yahoo!" and "MSN.com", are used by millions of users each day to find information.” Where the millions of users is interpreted as the population of users) 
after presentations of computer-generated navigation elements to the population of users in response to search queries received from the population of users during sessions of the population of users (Wang, Column 8, Lines 16-19, “At block 312, the results of the FAQ matching and keyword searching are presented to the user via the search engine UI 200. The user is then given the opportunity to offer feedback in an attempt to confirm the accuracy of the search.” Where the results are inherent to be present after the query that indicates a user question to the search engine is made to the search engine. The Results presented to the user via the search engine is inherent to have the presentations of computer-generated navigation elements such as for example concept of travel information as show in fig. 7. Where the results are presented in response to a user query entered. Further, Wang, Column 8, Lines 3-13, “the search engine 140 receives a user query entered at remote client 102.” Where in the user query is interpreted as the search queries received from the population of users during sessions of the population of users);
computing, by the at least one reinforcement learning model, at least two reward scores (Wang, fig. 7, Column 10, Lines 34-42, “the score of the parsing result is calculated by discounting the number of input items and rule items that are skipped during the parsing opera” Where scores is being calculated/computed. Further, column 12, Lines 1-57, “("How to go from Beijing to Shanghai") are "Travel" and "Route", where the 10 match result is a FAQ set {101(weight 165), 105(weight 90)}.” The weights are being set/computed. The weights are interpreted as scores that is calculated/computed for an answer regarding a flight that was question via a user query to the search engine);
 	using the user state data for a plurality of computer-generated navigation element options (Wang, fig. 7, Column 12, Lines 1-11, “("How to go from Beijing to Shanghai") are "Travel" and "Route", where the match result is a FAQ set {101(weight 165), 105(weight 90)}.” Where the How to go from Beijing to Shanghai is a user state data and the "Travel" and "Route" are interpreted as the plurality of computer-generated navigation element options);
using the at least two reward scores, selecting, by the at least one reinforcement learning model, at least one navigation element of the plurality of computer-generated navigation element options (Wang, fig. 7, Column 12, Lines 15-18, “The search determines that the third entry 710 in the table yields a perfect match. Corresponding to the "route" entry 710 is the FAQ with ID "101", which can be used to index the FAQ table 704.” Where the determination is due to best/higher weight/scores of the ID 101 comparing with the other one in the table. The weights/scores used to make the determination is interpreted as the at least two reward scores. The determines is interpreted as the selecting. The ID 101 represents one of the FAQ which herein is interpreted as the travel or route); 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Wang that teaches search engine architecture is designed to handle a full range of user queries, from complex sentence-based queries to simple keyword searches into Cheng that teaches search engines operate by a user typing or speaking a query, and the search engine returning one or more search results that are determined to be most responsive to the query. Additionally, this improve accuracy of converting a natural language string to a valid database query.
The motivation for doing so would be to better capture the user's intention as a way to provide higher quality search results (Wang Column 2, Lines 54-56). 

As per claim 8, Cheng teaches at least one or more non-transitory computer-readable storage media comprising instructions which, when executed by at least one processor, cause the at least one processor to be capable of performing operations comprising (Cheng, fig. 6, Column 25, Lines 1-11, “computer program product can be tangibly embodied in an information carrier. The computer program product may also contain instructions that, when executed, perform one or more methods, such as those described above. The information carrier is a computer- or machine-readable medium, such as the memory 604, the storage device 606, or memory on processor 602.”): 
extracting digital data from a session that includes user interaction with a user interface of a search engine (Cheng, fig. 1D, Column 7, Lines 10-59, “The system may then extract information from the page at the URL that is believed to be highly relevant, such as headings, page titles, and images, and may present the extracted information to the user. The user may then select certain of the items from the page to add to his or her post.” Wherein the information from the page at the URL is interpreted as the digital data from the session. The user may then select certain of the items from the page to add to his or her post is interpreted as the user interaction with the user interface. The interfaces illustrated in fig. 1A-D are interpreted as user interfaces. The search engine is inherent to be google search engine); 
the digital data comprises search query data (Cheng, fig. 1D:136. Column 7, Lines 22-27, “FIG. 1D shows the results page delivered to the user when the user submits the query. Here, the text of the query with the reserved character z removed, has been presented in a search edit input field 136 so that the user can choose to submit the text (either with or without further editing) simply as a search.” Where the text of the query is interpreted as the search query data), 
a sequence of user state data (Cheng, fig. 1D:136. Column 7, Lines 31-32, “a communication element 132 is displayed above a list of search results 134.” Wherein the list of search results is interpreted as the sequence of user state data. Further, Column 9, Lines 20-35, “For example, in a verbose mode (intended for spoken input) with results page confirmation, the syntax can take the form [verbose action trigger]+[action arguments and payload].” Wherein the verbose mode made of a sequence of user state data such as [verbose action trigger]+[action arguments and payload]), and 
user feedback data (Cheng, Column 27, Lines 47-51, “feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.” Where the visual feedback, auditory feedback, or tactile feedback are interpreted as the user feedback data); 
the search query data is for a search query received via an input device during the session (Cheng, Column 7, Lines 10-27, “FIGS. 1C and 1D are partial screen shots of a computer receiving messaging input in a search input field.” Where the computer is interpreted as the input device during the session. The computer is inherent to be any type of computer mobile or not. Further, “FIG. 1C shows the familiar GOOGLE search home page, where a user has entered the query "z loved avatar special effects" into the search input field 130-just as in the mobile computing example above.” Wherein the query "z loved avatar special effects" is interpreted as the search query); 
the sequence of user state data represents user interactions that occur before a response to the search query is presented (Cheng, Column 9, Lines 20-35, “In such a situation, the user's device or a remote search service may initially determine from the user's voice commands whether the user intends to conduct a traditional search, or instead to perform a non-search action.” Where the user's voice commands are interpreted as the sequence of user state data represents user interactions that occur before the response to the search query is presented once occurs before the determination of what is the user intends); 
Application No.: 17/038,901- 4/15-Art Unit: 2168the user feedback data comprises at least one response to at least one previous presentation of navigation elements by the user interface (Cheng, fig. 2A, Column 10, Lines 35-40, “Thus, the various types of feedback may be made available to each user conveniently in one place. Users may also see posts related to other users my visiting profile pages for those other users, and may also go to their own profile pages or to their stream pages to see all of the posts and comments for posts to which they are subscribed” Wherein the user can see/navigate to his own post or other users posts. The feedback is inherent to be made in post that was already posted/available to be viewed. The posts are interpreted as at least one previous presentation of navigation elements by the user interface, and the comments are interpreted as the least one response to the post);
outputting the selected at least one navigation element for presentation in response to the search query via an output device operably coupled to the input device (Cheng, fig. 1D:134, Column, 7, Lines 31-32, “a communication element 132 is displayed above a list of search results 134.” Where the displayed above a list of search results is interpreted as the outputting the selected at least one navigation element for presentation via an output device operably coupled to the input device. Further, Column 27, 40-51, “To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device ( e.g., a CRT ( cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer.”). 
However, it is noted that the prior art of Cheng does not explicitly teach “inputting the digital data into at least one reinforcement learning model trained using population state data; the population state data indicating sequences of states of a population of users after presentations of computer-generated navigation elements to the population of users in response to search queries received from the population of users during sessions of the population of users; computing, by the at least one reinforcement learning model, at least two reward scores for a plurality of computer-generated navigation element options; using the at least two reward scores, selecting, by the reinforcement learning model, at least one navigation element of the plurality of computer-generated navigation element options;”
On the other hand, in the same field of endeavor, Wang teaches inputting the digital data into at least one reinforcement learning model trained using population state data (Wang, Column 13, Lines 5-64, “Training Robust Parser Using Query Log Files” Where “A standard gradient descent method is used to train the perceptron, such as that described in S. Russell, P. Norvig, 60 "Artificial Intelligence", Prentice-Hall, Inc. 1995, pp573-577. The training data is the user query log file where the sentences are classified as positive and negative examples.” The standard gradient descent method is interpreted as the at least one reinforcement learning model trained using population state data. The sentences are interpreted as the population state data); 
the population state data indicating sequences of states of a population of users (Wang, Column 10, Lines 65-67, “Consider the above rule for the sentence … (“How lo gel from Beijing to Shanghai?").” Where the How lo gel from Beijing to Shanghai is a sentence that is interpreted as the population state data indicates sequences of states of the population of users. Further, Column 1, Lines 16-18, “Today's popular search engines, such as "Yahoo!" and "MSN.com", are used by millions of users each day to find information.” Where the millions of users is interpreted as the population of users)
after presentations of computer-generated navigation elements to the population of users in response to search queries received from the population of users during sessions of the population of users (Wang, Column 8, Lines 16-19, “At block 312, the results of the FAQ matching and keyword searching are presented to the user via the search engine UI 200. The user is then given the opportunity to offer feedback in an attempt to confirm the accuracy of the search.” Where the results are inherent to be present after the query that indicates a user question to the search engine is made to the search engine. The Results presented to the user via the search engine is inherent to have the presentations of computer-generated navigation elements such as for example concept of travel information as show in fig. 7. Where the results are presented in response to a user query entered. Further, Wang, Column 8, Lines 3-13, “the search engine 140 receives a user query entered at remote client 102.” Where in the user query is interpreted as the search queries received from the population of users during sessions of the population of users);
computing, by the at least one reinforcement learning model, at least two reward scores (Wang, fig. 7, Column 10, Lines 34-42, “the score of the parsing result is calculated by discounting the number of input items and rule items that are skipped during the parsing opera” Where scores is being calculated/computed. Further, column 12, Lines 1-57, “("How to go from Beijing to Shanghai") are "Travel" and "Route", where the 10 match result is a FAQ set {101(weight 165), 105(weight 90)}.” The weights are being set/computed. The weights are interpreted as scores that is calculated/computed for an answer regarding a flight that was question via a user query to the search engine) 
for a plurality of computer-generated navigation element options (Wang, fig. 7, Column 12, Lines 1-11, “("How to go from Beijing to Shanghai") are "Travel" and "Route", where the match result is a FAQ set {101(weight 165), 105(weight 90)}.” Where the How to go from Beijing to Shanghai is a user state data and the "Travel" and "Route" are interpreted as the plurality of computer-generated navigation element options); 
using the at least two reward scores, selecting, by the reinforcement learning model, at least one navigation element of the plurality of computer-generated navigation element options (Wang, fig. 7, Column 12, Lines 15-18, “The search determines that the third entry 710 in the table yields a perfect match. Corresponding to the "route" entry 710 is the FAQ with ID "101", which can be used to index the FAQ table 704.” Where the determination is due to best/higher weight/scores of the ID 101 comparing with the other one in the table. The weights/scores used to make the determination is interpreted as the at least two reward scores. The determines is interpreted as the selecting. The ID 101 represents one of the FAQ which herein is interpreted as the travel or route); 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Wang that teaches search engine architecture is designed to handle a full range of user queries, from complex sentence-based queries to simple keyword searches into Cheng that teaches search engines operate by a user typing or speaking a query, and the search engine returning one or more search results that are determined to be most responsive to the query. Additionally, this improve accuracy of converting a natural language string to a valid database query.
The motivation for doing so would be to better capture the user's intention as a way to provide higher quality search results (Wang Column 2, Lines 54-56). 

As per claim 16, Cheng teaches a system, comprising (Cheng, Column 2, Line 19, “a search engine system”): 
at least one processor (Cheng, Column 25, Line 1-11, “memory on processor 602”); 
memory operably coupled to the at least one processor (Cheng, Column 25, Line 1-11, “memory on processor 602”); 
instructions stored in the memory and capable of being executed by the at least one processor, the instructions comprising a reinforcement learning-based agent configured to perform operations comprising (Cheng, fig. 6, Column 25, Lines 1-11, “computer program product can be tangibly embodied in an information carrier. The computer program product may also contain instructions that, when executed, perform one or more methods, such as those described above. The information carrier is a computer- or machine-readable medium, such as the memory 604, the storage device 606, or memory on processor 602.” Where the computer program product is interpreted as the reinforcement learning-based agent): 
extracting digital data from a session that includes user interaction with a user interface of a search engine (Cheng, fig. 1D, Column 7, Lines 10-59, “The system may then extract information from the page at the URL that is believed to be highly relevant, such as headings, page titles, and images, and may present the extracted information to the user. The user may then select certain of the items from the page to add to his or her post.” Wherein the information from the page at the URL is interpreted as the digital data from the session. The user may then select certain of the items from the page to add to his or her post is interpreted as the user interaction with the user interface. The interfaces illustrated in fig. 1A-D are interpreted as user interfaces. The search engine is inherent to be google search engine); 
the digital data comprises search query data (Cheng, fig. 1D:136. Column 7, Lines 22-27, “FIG. 1D shows the results page delivered to the user when the user submits the query. Here, the text of the query with the reserved character z removed, has been presented in a search edit input field 136 so that the user can choose to submit the text (either with or without further editing) simply as a search.” Where the text of the query is interpreted as the search query data), 
a sequence of user state data (Cheng, fig. 1D:136. Column 7, Lines 31-32, “a communication element 132 is displayed above a list of search results 134.” Wherein the list of search results is interpreted as the sequence of user state data. Further, Column 9, Lines 20-35, “For example, in a verbose mode (intended for spoken input) with results page confirmation, the syntax can take the form [verbose action trigger]+[action arguments and payload].” Wherein the verbose mode made of a sequence of user state data such as [verbose action trigger]+[action arguments and payload]), and 
user feedback data (Cheng, Column 27, Lines 47-51, “feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.” Where the visual feedback, auditory feedback, or tactile feedback are interpreted as the user feedback data);
the search query data is for a search query received via an input device during the session (Cheng, Column 7, Lines 10-27, “FIGS. 1C and 1D are partial screen shots of a computer receiving messaging input in a search input field.” Where the computer is interpreted as the input device during the session. The computer is inherent to be any type of computer mobile or not. Further, “FIG. 1C shows the familiar GOOGLE search home page, where a user has entered the query "z loved avatar special effects" into the search input field 130-just as in the mobile computing example above.” Wherein the query "z loved avatar special effects" is interpreted as the search query);
Application No.: 17/038,901 - 7/15-Art Unit: 2168the sequence of user state data represents user interactions that occur before a response to the search query is presented (Cheng, Column 9, Lines 20-35, “In such a situation, the user's device or a remote search service may initially determine from the user's voice commands whether the user intends to conduct a traditional search, or instead to perform a non-search action.” Where the user's voice commands are interpreted as the sequence of user state data represents user interactions that occur before the response to the search query is presented once occurs before the determination of what is the user intends); 
the user feedback data comprises at least one response to at least one previous presentation of navigation elements by the user interface (Cheng, fig. 2A, Column 10, Lines 35-40, “Thus, the various types of feedback may be made available to each user conveniently in one place. Users may also see posts related to other users my visiting profile pages for those other users, and may also go to their own profile pages or to their stream pages to see all of the posts and comments for posts to which they are subscribed” Wherein the user can see/navigate to his own post or other users posts. The feedback is inherent to be made in post that was already posted/available to be viewed. The posts are interpreted as at least one previous presentation of navigation elements by the user interface, and the comments are interpreted as the least one response to the post);
outputting the selected at least one navigation element for presentation in response to the search query via an output device operably coupled to the input device (Cheng, fig. 1D:134, Column, 7, Lines 31-32, “a communication element 132 is displayed above a list of search results 134.” Where the displayed above a list of search results is interpreted as the outputting the selected at least one navigation element for presentation via an output device operably coupled to the input device. Further, Column 27, 40-51, “To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT ( cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer.”).
However, it is noted that the prior art of Cheng does not explicitly teach “inputting the digital data into at least one reinforcement learning model trained using population state data; the population state data indicating sequences of states of a population of users after presentations of computer-generated navigation elements to the population of users in response to search queries received from the population of users during sessions of the population of users; computing, by the at least one reinforcement learning model, at least two reward scores for a plurality of computer-generated navigation element options; using the at least two reward scores, selecting, by the reinforcement learning model, at least one navigation element of the plurality of computer-generated navigation element options;”
On the other hand, in the same field of endeavor, Wang teaches inputting the digital data into at least one reinforcement learning model trained using population state data (Wang, Column 13, Lines 5-64, “Training Robust Parser Using Query Log Files” Where “A standard gradient descent method is used to train the perceptron, such as that described in S. Russell, P. Norvig, 60 "Artificial Intelligence", Prentice-Hall, Inc. 1995, pp573-577. The training data is the user query log file where the sentences are classified as positive and negative examples.” The standard gradient descent method is interpreted as the at least one reinforcement learning model trained using population state data. The sentences are interpreted as the population state data); 
the population state data indicating sequences of states of a population of users (Wang, Column 10, Lines 65-67, “Consider the above rule for the sentence … (“How lo gel from Beijing to Shanghai?").” Where the How lo gel from Beijing to Shanghai is a sentence that is interpreted as the population state data indicates sequences of states of the population of users. Further, Column 1, Lines 16-18, “Today's popular search engines, such as "Yahoo!" and "MSN.com", are used by millions of users each day to find information.” Where the millions of users is interpreted as the population of users)
 after presentations of computer-generated navigation elements to the population of users in response to search queries received from the population of users during sessions of the population of users (Wang, Column 8, Lines 16-19, “At block 312, the results of the FAQ matching and keyword searching are presented to the user via the search engine UI 200. The user is then given the opportunity to offer feedback in an attempt to confirm the accuracy of the search.” Where the results are inherent to be present after the query that indicates a user question to the search engine is made to the search engine. The Results presented to the user via the search engine is inherent to have the presentations of computer-generated navigation elements such as for example concept of travel information as show in fig. 7. Where the results are presented in response to a user query entered. Further, Wang, Column 8, Lines 3-13, “the search engine 140 receives a user query entered at remote client 102.” Where in the user query is interpreted as the search queries received from the population of users during sessions of the population of users); 
computing, by the at least one reinforcement learning model, at least two reward scores (Wang, fig. 7, Column 10, Lines 34-42, “the score of the parsing result is calculated by discounting the number of input items and rule items that are skipped during the parsing opera” Where scores is being calculated/computed. Further, column 12, Lines 1-57, “("How to go from Beijing to Shanghai") are "Travel" and "Route", where the 10 match result is a FAQ set {101(weight 165), 105(weight 90)}.” The weights are being set/computed. The weights are interpreted as scores that is calculated/computed for an answer regarding a flight that was question via a user query to the search engine) 
for a plurality of computer-generated navigation element options (Wang, fig. 7, Column 12, Lines 1-11, “("How to go from Beijing to Shanghai") are "Travel" and "Route", where the match result is a FAQ set {101(weight 165), 105(weight 90)}.” Where the How to go from Beijing to Shanghai is a user state data and the "Travel" and "Route" are interpreted as the plurality of computer-generated navigation element options); 
using the at least two reward scores, selecting, by the reinforcement learning model, at least one navigation element of the plurality of computer-generated navigation element options (Wang, fig. 7, Column 12, Lines 15-18, “The search determines that the third entry 710 in the table yields a perfect match. Corresponding to the "route" entry 710 is the FAQ with ID "101", which can be used to index the FAQ table 704.” Where the determination is due to best/higher weight/scores of the ID 101 comparing with the other one in the table. The weights/scores used to make the determination is interpreted as the at least two reward scores. The determines is interpreted as the selecting. The ID 101 represents one of the FAQ which herein is interpreted as the travel or route); 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Wang that teaches search engine architecture is designed to handle a full range of user queries, from complex sentence-based queries to simple keyword searches into Cheng that teaches search engines operate by a user typing or speaking a query, and the search engine returning one or more search results that are determined to be most responsive to the query. Additionally, this improve accuracy of converting a natural language string to a valid database query.
The motivation for doing so would be to better capture the user's intention as a way to provide higher quality search results (Wang Column 2, Lines 54-56).

6.	Claims 2-7, 9-15, and 17-20 are rejected under 35 U.S.C. § 103 as being unpatentable over Cheng et al. (US 9917904 B1) in view of Wang et al. (US 6766320 B1) in further view of Yadav et al. (US 20210019309 A1).
	
As per claim 2, Cheng, and Wang teach all the limitations as discussed in claim 1 above.  
However, it is noted that the combination of the prior arts of Cheng, and Wang do not explicitly teach “further comprising updating the sequence of user state data to include additional user state data extracted from the session after the at least one navigation element of the plurality of computer-generated navigation element options has been output, and receiving, from the at least one reinforcement learning model, a re-computed set of reward scores computed using the additional user state data.”
On the other hand, in the same field of endeavor, Yadav teaches further comprising updating the sequence of user state data (Yadav, fig. 20:2080, par. [0492], [0494], updating the usage data for the token. Where the token is presented to the user in a display in a sequence. See fig. 1:132, 1:134) to include additional user state data extracted from the session after the at least one navigation element of the plurality of computer-generated navigation element options has been output (Yadav, fig. 9:910, par. [0303], extracting an inference pattern from a string and a database query. Where the inference pattern is interpreted as the additional user state data extracted from the session after the at least one navigation element of the plurality of computer-generated navigation element options has been output. Where the string query is interpreted to be a string query enter from a user session. Furthermore, a plurality of token is inherent to be already presented to the user session as illustrated in fig. 1), and 
receiving, from the at least one reinforcement learning model, a re-computed set of reward scores computed using the additional user state data (Yadav, fig. 17:1716,  adjusting a ranking score of the first candidate database query based on a ranking score adjustment of the pattern. Where adjusting a ranking score of the first candidate database query based on a ranking score adjustment of the pattern is inherent to computed/calculated again. It is also inherent that to ranking score of the first candidate database query based on a ranking score adjustment of the pattern it would be compared again one or more other ranking score of the first candidate database query based on a ranking score adjustment of the pattern. Therefore, it would be a set of scores).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Yadav that teaches mapping natural language to queries using a query grammar into the combination of Cheng that teaches search engines operate by a user typing or speaking a query, and the search engine returning one or more search results that are determined to be most responsive to the query, and Wang that teaches search engine architecture is designed to handle a full range of user queries, from complex sentence-based queries to simple keyword searches. Additionally, this improve accuracy of converting a natural language string to a valid database query.
The motivation for doing so would be to improve interactions with a digital personal assistant agent using natural language (Yadav par. [0002]).

As per claim 3, Cheng, and Wang teach all the limitations as discussed in claim 1 above. 
Additionally, Yadav teaches further comprising, using the at least two reward scores (Yadav, par. [0488], “the scores for both the matches may be sufficiently close to be indistinguishable.” Where the scores is inherent to include a first and second scores. The first and second scores are interpreted as the at least two reward scores and is being used to define multiple candidate tokens (movie_imdb_link, movie_title) for the word "movie", see also fig. 4), 
selecting, by the reinforcement learning model, at least one search filter of a set of computer-generated optional search filters (Yadav, fig. 4:450, par. [0061], “selecting 450, based on the first score and the second score, the candidate database query from the set of candidate database queries”” where the selection of the candidate database query is inhering to being selected basing on the scores herein used as a filter to define the candidate database query. The set of candidate database queries are interpreted as the set of computer-generated optional search filters) and outputting the selected at least one search filter for presentation in response to the search query (Yadav, fig. 4:470, par. [0061], “presenting 470 data based on the search results in the user interface” Where the presented data is interpreted to be presented as result of the search selection of the candidate database query. Further, if the claimed “in response to the search query” is interpreted as automatically presentation of search query results. It would be obvious at the time of the invention to automate a known process to automatically present search query results. Besides that merely providing an automatic means to replace a manual activity  which accomplishes the same result is not sufficient to distinguish over the prior art, In re Venner, 262 F.2d 91, 95, 120 USPQ 193, 194 (CCPA 1958)).

As per claim 4, Cheng, and Wang teach all the limitations as discussed in claim 1 above. 
Additionally, Yadav teaches further comprising, using the at least two reward scores scores (Yadav, par. [0488], “the scores for both the matches may be sufficiently close to be indistinguishable.” Where the scores is inherent to include a first and second scores. The first and second scores are interpreted as the at least two reward scores and is being used to define multiple candidate tokens (movie_imdb_link, movie_title) for the word "movie", see also fig. 4), 
selecting, by the reinforcement learning model, at least one re-formulated search of a set of computer-generated re-formulated searches (Yadav, fig. 8:850, par. [0262], “selecting 850, based on the adjusted ranking score, the first candidate database query from the set of candidate database queries;” where the selecting first candidate database query is interpreted as the at least one re-formulated search of a set of computer-generated re-formulated searches. Further, par. [0488], “the scores for both the matches may be sufficiently close to be indistinguishable.” Where the scores is inherent to include a first and second scores. The first and second scores are interpreted as the at least two reward scores and is being used to define multiple candidate tokens (movie_imdb_link, movie_title) for the word "movie". Where the define multiple candidate tokens (movie_imdb_link, movie_title) for the word "movie" is inherent to be the at least two computer-generated re-formulated searches) and
 outputting the selected re-formulated search for presentation in response to the search query (Yadav, fig. 8:870, par. [0262], “presenting 870 data based on the search results in the user interface.” where the presenting data is outputting data associated with the selected re-formulated search for presentation in response to the search query. Further, if the claimed “in response to the search query” is interpreted as automatically presentation of search query results. It would be obvious at the time of the invention to automate a known process to automatically present search query results. Besides that merely providing an automatic means to replace a manual activity  which accomplishes the same result is not sufficient to distinguish over the prior art, In re Venner, 262 F.2d 91, 95, 120 USPQ 193, 194 (CCPA 1958)).

As per claim 5, Cheng, and Wang teach all the limitations as discussed in claim 1 above. 
Additionally, Yadav teaches further comprising, using the at least two reward scores (Yadav, par. [0488], “the scores for both the matches may be sufficiently close to be indistinguishable.” Where the scores is interpreted as the at least two reward scores and is being used to define multiple candidate tokens (movie_imdb_link, movie_title) for the word "movie", see also fig. 4), 
selecting, by the reinforcement learning model, at least one conversational navigation element of a set of computer-generated conversational natural language navigation elements (Yadav, fig. 15, par. [0409], “The search bar 1510 include grey slider icons that a user can moved within the search bar 1510 in order to select a fragment (e.g., a sequence of one or more words) of the text.” Where the fragment is interpreted as the set of computer-generated conversational natural language navigation elements) and outputting the selected at least one conversational navigation element for output in response to the search query (Yadav, fig. 15:1500, par. [0409], a display region 1500 generated for presenting a user interface to facilitate search of one or more databases that prompts a user to teach interface about their language usage. Where the display region interpreted to outputting the selected at least one conversational navigation element for output in response to the search query. Further, if the claimed “in response to the search query” is interpreted as automatically presentation of search query results. It would be obvious at the time of the invention to automate a known process to automatically present search query results. Besides that merely providing an automatic means to replace a manual activity  which accomplishes the same result is not sufficient to distinguish over the prior art, In re Venner, 262 F.2d 91, 95, 120 USPQ 193, 194 (CCPA 1958)).

As per claim 6, Cheng, and Wang teach all the limitations as discussed in claim 1 above. 
Additionally, Yadav teaches the at least one reinforcement learning model trained using population state data (Yadav, par. [0498], a machine learn model using trained neural network to identify relevant aspects of data in the context of one or more databases and use cases, where the one or more databases and use cases is inherent to have population state data. See also par. [0505]) 
indicating sequences of states of a population of users after presentations of computer-generated optional navigation elements to the population of users (Yadav, fig. 11, par. [0370], “The learning algorithm learns from multiple queries that lead to the same result.  It also learns from user sessions when users do same pair of queries subsequently.” Where users do same pair of queries subsequently is interpreted to produce sequences of states of a population of users after presentations of computer-generated optional navigation elements to the population of users. “For example, if lot of users ask for "revenue averages over month" and then follow it up with "average revenue over month" We learn that "&lt;column&gt;&lt;average&gt; can be re-written as "average &lt;column&gt;".”. Fig. 1, par. [0271], “(b) refinements used to edit a database query that has generated based on a string (e.g., edits to a translated database query that are entered using the token icons (132, 134, 136, 138, or 140)” Where the token icons are optional navigation elements to the population of users. The the token icons is interpreted to indicate a sequence, see also fig. 14, par. [0408])
in response to natural language search queries received from the population of users during sessions of the population of users (Yadav, par. [0128], [0263], “the natural language syntax data for words of the string may be determined 520 by submitting the string as part of request to server providing natural language processing and receiving the natural language syntax data in response to the request.” Where the request is interpreted to be received during a user’s during sessions of the population of users. Further, if the claimed “in response to the search query” is interpreted as automatically presentation of search query results. It would be obvious at the time of the invention to automate a known process to automatically present search query results. Besides that merely providing an automatic means to replace a manual activity  which accomplishes the same result is not sufficient to distinguish over the prior art, In re Venner, 262 F.2d 91, 95, 120 USPQ 193, 194 (CCPA 1958)).

As per claim 7, Cheng, and Wang teach all the limitations as discussed in claim 1 above. 
Additionally, Yadav teaches the session comprising a temporal sequence of user activities (Yadav, fig. 18:1820, par. [0447], [0448], obtain the unstructured search string user input as a sequence of individual characters or symbols. Where the sequence of individual characters or symbols is interpreted as the temporal sequence of user activities. Where the user input is interpreted as a user activity) including at least one user activity involving a search engine (Yadav, fig. 18:1850, par. [0453], relational search engine) and at least one user activity involving a connections network-based system (Yadav, fig. 19, par. [0477], provide a connection or link to a network via a network interface. Where the provided a connection or link to a network via a network interface is interpreted to involve at least one user activity involving a connections network-based system. Further, par. [0493], “a server that is presenting a user interface (e.g., a webpage) to a user who is located at a remote location via communication messages over an electronic communications network (e.g., a wide area network).  For example, the user interface may include the display region 110 of FIG. 1.”).
As per claim 9, Cheng, and Wang teach all the limitations as discussed in claim 8 above.  
However, it is noted that the combination of the prior arts of Cheng, and Wang do not explicitly teach “wherein the instructions further cause computing, as a reward score of the at least two reward scores, a probability that a re-formulated search of the computer-generated re-formulated searches corresponds to a natural language sentence.”
On the other hand, in the same field of endeavor, Yadav teaches wherein the instructions further cause computing, as a reward score of the at least two reward scores (Yadav, fig. 4:430, 4:440, par. [0064], determining a first and a second scores for candidate database query from the set of candidate database queries. Where the first and a second scores is interpreted as the least two reward scores. Where determining is interpreted as producing. Further, par. [0488], “the scores for both the matches may be sufficiently close to be indistinguishable.” Where the scores is inherent to include a first and second scores. The first and second scores are interpreted as the at least two reward scores and is being used to define multiple candidate tokens (movie_imdb_link, movie_title) for the word "movie"), a probability that a re-formulated search of the computer-generated re-formulated searches corresponds to a natural language sentence (Yadav, fig. 18, par. [0249]-[0252], a user can take a fully resolved database query in terms of recognized tokens, and give possible natural language version of those questions. Where the possible natural language version of a question is interpreted as the probability that the re-formulated search of the computer-generated re-formulated searches corresponds to the natural language sentence).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Yadav that teaches mapping natural language to queries using a query grammar into the combination of Cheng that teaches search engines operate by a user typing or speaking a query, and the search engine returning one or more search results that are determined to be most responsive to the query, and Wang that teaches search engine architecture is designed to handle a full range of user queries, from complex sentence-based queries to simple keyword searches. Additionally, this improve accuracy of converting a natural language string to a valid database query.
The motivation for doing so would be to improve interactions with a digital personal assistant agent using natural language (Yadav par. [0002]).

As per claim 10, Cheng, and Wang teach all the limitations as discussed in claim 8 above. 
Additionally, Yadav teaches wherein the instructions further cause computing, as a reward score of the at least two reward scores (Yadav, fig. 4:430, 4:440, par. [0064], determining a first and a second scores for candidate database query from the set of candidate database queries. Where the first and a second scores is interpreted as the least two reward scores. Where determining is interpreted as producing. Further, par. [0488], “the scores for both the matches may be sufficiently close to be indistinguishable.” Where the scores is inherent to include a first and second scores. The first and second scores are interpreted as the at least two reward scores and is being used to define multiple candidate tokens (movie_imdb_link, movie_title) for the word "movie"), 
a measurement of semantic similarity between a re-formulated search of the computer-generated re-formulated searches and the search query data (Yadav, par. [0237]-[0238], “Semantic Match Classifier: In some implementations, a semantic match classifier filters out semantic matches on measures if the following conditions hold: If a larger span of words has an EXACT match on a column name.” Wherein the semantic is inherent to match words similarity. Where the words similar are inherent to be words that has a sematic similarity between semantic match classifier filter herein interpreted as the re-formulated search of the computer-generated re-formulated searches and the search query data. The search query data is taught by Yadav, fig. 1:120, par. [0042], above).

As per claim 11, Cheng, and Wang teach all the limitations as discussed in claim 8 above. 
Additionally, Yadav teaches wherein the instructions further cause computing, as a reward score of the at least two reward scores (Yadav, fig. 4:430, 4:440, par. [0064], determining a first and a second scores for candidate database query from the set of candidate database queries. Where the first and a second scores is interpreted as the least two reward scores. Where determining is interpreted as producing. Further, par. [0488], “the scores for both the matches may be sufficiently close to be indistinguishable.” Where the scores is inherent to include a first and second scores. The first and second scores are interpreted as the at least two reward scores and is being used to define multiple candidate tokens (movie_imdb_link, movie_title) for the word "movie"), 
a measurement of diversity of terms within a re-formulated search of the computer-generated re-formulated searches relative to a length of the re-formulated search (Yadav, par. [0243]-[0248], “Scorer: The scorer uses the following function for scoring individual tokens: Token.wordLength[circumflex over ( )]2*MatchTypeMultiplier[Token.MatchType] *TokenTypeMultiplier[Token.TokenType] *(Token.MatchPenalty)[circumflex over ( )]TextOverlapExponent*(IdFCoverage)[circumflex over ( )]IdfCoverageExponent”)” Where each token has a terms and the length of the word is being calculated to relatively influence the scorer base on the token terms. The tokens are inherent to also be a reformulated search. Wherein the word can be represent the re-formulated search. The terms of each token is inherent to be a measurement of diversity of terms within a re-formulated search of the computer-generated re-formulated searches relative to a length of the re-formulated search. See also par. [0183], [0514]).

As per claim 12, Cheng, and Wang teach all the limitations as discussed in claim 8 above. 
Additionally, Yadav teaches wherein the instructions further cause, using the sequence of user state data, computing, as a reward score of the at least two reward scores (Yadav, fig. 4:430, 4:440, par. [0064], determining a first and a second scores for candidate database query from the set of candidate database queries. Where the first and a second scores is interpreted as the least two reward scores. Where determining is interpreted as producing. Further, par. [0488], “the scores for both the matches may be sufficiently close to be indistinguishable.” Where the scores is inherent to include a first and second scores. The first and second scores are interpreted as the at least two reward scores and is being used to define multiple candidate tokens (movie_imdb_link, movie_title) for the word "movie"), 
a measurement of user engagement during the session (Yadav, fig. 9:950, par. [0307], confidence score based on the set of context features. Where the set of context features is inherent to include the at least two reward scores. The confidence score is interpreted as the reward score. See also par. [0101], learning feedback from a user).

As per claim 13, Cheng, and Wang teach all the limitations as discussed in claim 8 above. 
Additionally, Yadav teaches wherein the instructions further cause computing, as a reward score of the at least two reward scores (Yadav, fig. 4:430, 4:440, par. [0064], determining a first and a second scores for candidate database query from the set of candidate database queries. Where the first and a second scores is interpreted as the least two reward scores. Where determining is interpreted as producing. Further, par. [0488], “the scores for both the matches may be sufficiently close to be indistinguishable.” Where the scores is inherent to include a first and second scores. The first and second scores are interpreted as the at least two reward scores and is being used to define multiple candidate tokens (movie_imdb_link, movie_title) for the word "movie"), 
a measurement of syntactic similarity between the search query data and a re-formulated search of the computer-generated re-formulated searches (Yadav, par. [0252], “(3.) If we find clear mapping between token words and words in the tokens, we templetize that instance and build a sequence pattern out of it so that we can substitute other similar strings.  For example, "who directed titanic" pattern can also help "who directed Avatar".” Where the clear mapping between token words and words in the tokens is interpreted as the measurement of syntactic similarity between the search query data and a re-formulated search of the computer-generated re-formulated searches. Where the token words can be the search query data and the   words in the tokens can be the re-formulated search of the computer-generated re-formulated searches).

As per claim 14, Cheng, and Wang teach all the limitations as discussed in claim 8 above. 
Additionally, Yadav teaches wherein the instructions further cause computing, as a reward score of the at least two reward scores (Yadav, fig. 4:430, 4:440, par. [0064], determining a first and a second scores for candidate database query from the set of candidate database queries. Where the first and a second scores is interpreted as the least two reward scores. Where determining is interpreted as producing. Further, par. [0488], “the scores for both the matches may be sufficiently close to be indistinguishable.” Where the scores is inherent to include a first and second scores. The first and second scores are interpreted as the at least two reward scores and is being used to define multiple candidate tokens (movie_imdb_link, movie_title) for the word "movie"), 
a difference between a start time of the session and a time of occurrence of a success event during the session (Yadav, par. [0064], [0495], a weighted average of occurrence counter values for different intervals of time. Where the different intervals of time is interpreted as the difference between the start time of the session and the time of occurrence of the success event during the session).

As per claim 15, Cheng, and Wang teach all the limitations as discussed in claim 9 above. 
Additionally, Yadav teaches wherein the instructions further cause computing a final reward score as a weighted sum of reward scores of the set of reward scores (Yadav, figs. 12, 22, par. [0533], a largest sum of weights for directed edges of the tour. Where the largest sum of weights is the final reward score as the weighted sum of reward scores of the set of reward scores), and 
selecting the at least one re-formulated search based on the final reward score (Yadav, figs. 12, 22, par. [0520], a tour selected that maximizes the sum of weights for its directed edges. Where the tour selected will determine the sequence of tokens of the database query. Therefore, it is inherent that the tour is re-formulated a search based on the sum of the weights for its directed edges).

As per claim 17, Cheng, and Wang teach all the limitations as discussed in claim 16 above.  
However, it is noted that the combination of the prior arts of Cheng, and Wang do not explicitly teach “wherein the reinforcement learning-based agent comprises a is configured to train the at least one reinforcement learning model trained using population state data indicating sequences of states of a population of users after presentations of computer- generated optional navigation elements to the population of users in response to natural language search queries received from the population of users during sessions of the population of users.”
On the other hand, in the same field of endeavor, Yadav teaches wherein the reinforcement learning-based agent comprises a is configured to train the at least one reinforcement learning model trained using population state data (Yadav, par. [0498], a machine learn model using trained neural network to identify relevant aspects of data in the context of one or more databases and use cases, where the one or more databases and use cases is inherent to have population state data is being used. See also par. [0505]) 
indicating sequences of states of a population of users after presentations of computer-generated optional navigation elements to the population of users (Yadav, fig. 11, par. [0370], “The learning algorithm learns from multiple queries that lead to the same result.  It also learns from user sessions when users do same pair of queries subsequently.” Where users do same pair of queries subsequently is interpreted to produce sequences of states of a population of users after presentations of computer-generated optional navigation elements to the population of users. “For example, if lot of users ask for "revenue averages over month" and then follow it up with "average revenue over month" We learn that "&lt;column&gt;&lt;average&gt; can be re-written as "average &lt;column&gt;".”. Fig. 1, par. [0271], “(b) refinements used to edit a database query that has generated based on a string (e.g., edits to a translated database query that are entered using the token icons (132, 134, 136, 138, or 140)” Where the token icons are optional navigation elements to the population of users. The token icons presenting in a sequence to the users, see also figs. 1, 14, par. [0408])
in response to natural language search queries received from the population of users during sessions of the population of users (Yadav, par. [0128], [0263], “the natural language syntax data for words of the string may be determined 520 by submitting the string as part of request to server providing natural language processing and receiving the natural language syntax data in response to the request.” Where the request is interpreted to be received during a user’s during sessions of the population of users in a user natural language expression. Further, if the claimed “in response to the search query” is interpreted as automatically presentation of search query results. It would be obvious at the time of the invention to automate a known process to automatically present search query results. Besides that merely providing an automatic means to replace a manual activity which accomplishes the same result is not sufficient to distinguish over the prior art, In re Venner, 262 F.2d 91, 95, 120 USPQ 193, 194 (CCPA 1958)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Yadav that teaches mapping natural language to queries using a query grammar into the combination of Cheng that teaches search engines operate by a user typing or speaking a query, and the search engine returning one or more search results that are determined to be most responsive to the query, and Wang that teaches search engine architecture is designed to handle a full range of user queries, from complex sentence-based queries to simple keyword searches. Additionally, this improve accuracy of converting a natural language string to a valid database query.
The motivation for doing so would be to improve interactions with a digital personal assistant agent using natural language (Yadav par. [0002]).

As per claim 18, Cheng, and Wang teach all the limitations as discussed in claim 16 above. 
Additionally, Yadav teaches wherein the reinforcement learning-based agent (Yadav, fig. 5, par. [0128], using a machine learning module (e.g., including a neural network) that has been trained to parse and classify words of a natural language phrases in a string. Where the machine learning module is interpreted herein as the reinforcement learning model. The trained to parse and classify words of a natural language phrases in a string is interpreted as the trained using a policy gradient method).  

As per claim 19, Cheng, and Wang teach all the limitations as discussed in claim 16 above. 
Additionally, Yadav teaches wherein the reinforcement learning-based agent is configured to provide the selected at least one navigation element to the user interface of the search engine (Yadav, figs. 1, 20, par. [0494], select a subset of the possible spanning permutations of matched tokens for generation as candidate database queries. Where the matched tokens are interpreted as the provided the selected subset of the plurality of optional navigation elements to the user interface of the search engine).  

As per claim 20, Cheng, and Wang teach all the limitations as discussed in claim 16 above. 
Additionally, Yadav teaches wherein the reinforcement learning agent is configured to provide the selected at least one navigation element to the user interface of the online network-based system (Yadav, figs. 1, 19, par. [0477], provide a connection or link to a network via a network interface. Where the provided a connection or link to a network via a network interface is interpreted to involve at least one user activity involving a connections network-based system. Further, par. [0493], “a server that is presenting a user interface (e.g., a webpage) to a user who is located at a remote location via communication messages over an electronic communications network (e.g., a wide area network).  For example, the user interface may include the display region 110 of FIG. 1.”).



Prior Art of Record
7.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Acharya (US 20180232648 A1), teaches performing cognitive inference and learning operations.
Berkhim et al. (US 20170116200 A1), teaches improving a reliability of searching and search results through the incorporation of an actions of users.
Ortega et al. (US 20110035370 A1), teaches identifying related search terms based on search query submissions of users.

Response to Arguments
8. 	Amendments made to claim 1 overcomes the objection of last office action; therefore, this objection is hereby withdrawn.

	Applicant’s arguments filed 03/29/2022, with respect to the U.S.C. § 112 (b) rejections have been fully considered and are persuasive. It is respectfully noted that the Applicant clarification of the limitations on claims (Applicant arguments, page 11). Further, the amendments made to claims and the Applicant clarification had convinced the Examiner that the 35 U.S.C. § 112(b) rejection is overcome, and for this reason the 35 U.S.C. § 112(b) rejection of record is withdrawn.

	Applicant's arguments, filed on 03/29/2022 with respect to the rejection of claims 1-20 under 35 U.S.C. §103 (Applicant’s arguments, pages 11-14), have been fully considered and are but are moot. Therefore, the rejection has been maintained and see the reasons below. 
Examiner is entitled to give claim limitations their broadest reasonable interpretation in light of the specification. See MPEP 2111 [R-1]. Interpretation of claims during patent examination, the pending claims must be given the broadest reasonable interpretation consistent with the specification. Applicant always has the opportunity to amend the claims during prosecution and broad interpretation by the examiner reduces the possibility that the claim, once issued, will be interpreted more broadly than is justified. In re Prater, 162 USPQ 541,550- 51 (CCPA 1969).
Applicant argues that Yadav et al. (US 20210019309 A1) in view of Jang (US 20140359523 A1) do not teach the limitations of claims 1, 8, and 16 (Applicant arguments, pages 11-14). It is respectfully submitted that Yadav et al. (US 20210019309 A1) and Jang (US 20140359523 A1) are no longer used to teach this limitation but the newly added prior arts of Cheng et al. (US 9917904 B1) in view of Wang et al. (US 6766320 B1) teach the limitations of claims 1, 8, and 16 as shown above. 
Applicant’s remaining arguments with respect to the independent claims, and the claims that depend therefrom, have been considered but are moot because the arguments do not apply to the references being used in the current rejection.

Conclusion
9.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANTONIO CAIA DO whose telephone number is (469)295-9251.  The examiner can normally be reached on Monday - Friday / 06:30 to 16:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ehichioya, Fred can be reached on 571-272-4034.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ANTONIO J CAIA DO/
Examiner, Art Unit 2168

/IRETE F EHICHIOYA/Supervisory Patent Examiner, Art Unit 2168