Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 7 is objected to because of the following informalities:  
Claim 7 line 16: “steps Iii)” should read “step (ii)”.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shama et al (“Bayesian-Game-Based Fuzzy Reinforcement Learning Control for Decentralized POMDPs” 2012) in view of Shah et al (“Interactive reinforcement learning for task-oriented dialogue management” 2016)
1.Sharma disclose a system comprising: 
a plurality of agents (See e.g. Fig. 1), wherein each agent is configured to (i) provide, upon receiving a current state  (See e.g. Fig. 1, section II on state; section V on current world state) and a current token (See e.g. Fig. 1, section II on observation, section V on current world observation), a current prediction based on a policy that acts on the current state and the current token (See e.g. Fig. 1, section II on belief, section V on current belief), (ii) receive a current metric in response to providing the current prediction  (See e.g. Fig. 1, section II on reward), and (iii) modify the policy based on one or more of the current metrics received  (See e.g. Fig. 1, section I on using belief to approximate joint policy (i.e. modified policy) and executed by each agent); and 

    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    200
    400
    media_image3.png
    Greyscale

an environment element (See e.g. Fig. 1), wherein the environment element is configured to (i) determine a next state and selecting a next token (See e.g. section II on “At the next time step ,each agent receives a local observation of the environment”.  Examiner Note: that means the environment determined a next state and next observation (i.e. token) and provide that to the agents); (ii) provide to each agent the next state as the current state and the next token as the current token (See e.g. section II on “At the next time step ,each agent receives a local observation of the environment”.  Examiner Note: that means the environment determined a next state and provide that to the agents, the next state becomes current state in the next time step), (iii) from each of the agents, receive that agent's current prediction (See e.g. Fig. 1, section II on belief, section V on current belief), and (iv) to each of the agents, provide the current metric to that agent based on comparing that agent's current prediction against a predetermined intent (See e.g. Fig. 1, section II on reward against desired behavior), and wherein the environment element determines each next state, other than an initial state, based on one or one or more of the current states and all the current tokens already provided to the agents (See e.g. section II on “At the next time step ,each agent receives a local observation of the environment…At the next time step, each agent receives a local observation of the environment”.  Examiner Note: that means the environment determine each subsequent next state but not the initial state)

    PNG
    media_image4.png
    200
    400
    media_image4.png
    Greyscale

While Sharma disclose reinforcement learning in general and disclose many real-world applications, Sharma fails to disclose apply RL to user intent application.  In particular, Sharma fails to disclose a system for training machine agents to user intent expressed in a document, the token being one of a plurality of portions extracted from the document, a current prediction of the user intent.
However, Shah disclose reinforcement learning (thereby in same field of endeavor) and apply RF to user intent application domain (See e.g. section 3.2).  Shah further disclose a system for training machine agents to user intent expressed in a document (See e.g. section 3.2 on training interactive RL from conversation logs (i.e. document), and that the training IRL to satisfy user goal/intent), 

    PNG
    media_image5.png
    200
    400
    media_image5.png
    Greyscale


    PNG
    media_image6.png
    200
    400
    media_image6.png
    Greyscale


    PNG
    media_image7.png
    200
    400
    media_image7.png
    Greyscale

the token being one of a plurality of portions extracted from the document (See e.g. section 2.1 on dialogue act and slot value pairs.  Examiner note: each dialogue act and slot value pair is considered a “token”),



    PNG
    media_image8.png
    200
    400
    media_image8.png
    Greyscale

a current prediction of the user intent (See e.g. section 2.1 on dialogue state tracker that keep track of belief state of user’s goal based over the course of the dialogue).
As such, it would have been obvious to one having ordinary skill in the art, before the effective filing date of the claimed invention, to modify the reinforcement learning of Sharma to incorporate reinforcement learning of Shah with predictable result of applying RF to user intent application.
Given the advantage of faster learning speed of Shah, one having ordinary skill in the art would have been motivated to make this obvious modification.

    PNG
    media_image9.png
    200
    400
    media_image9.png
    Greyscale

2. Sharma disclose the system of Claim 1, wherein each current prediction is expressed as a confidence vector.

    PNG
    media_image10.png
    200
    400
    media_image10.png
    Greyscale


3. Sharma disclose the system of Claim 1, wherein each agent modifies its policy according to a cumulative reward (See e.g. section II on infinite-horizon discounted global reward), the cumulative reward being a sum of the current metrics then received by that agent (See e.g. section II on infinite-horizon discounted global reward), and wherein that agent modifies its policy based on increasing the cumulative reward (See e.g. section II on maximize expected infinite-horizon discounted global reward),.

    PNG
    media_image11.png
    200
    400
    media_image11.png
    Greyscale


4. Sharma disclose the system of Claim 3 wherein, in modifying its policy, each agent takes into consideration current metrics received by other agents (See e.g. section II on each agent receives a global reinforcement signal from the environment.  Examiner note: that means each and other agents receive a global reinforcement signal and that each and other agents takes into consideration the global reinforcement signal).

    PNG
    media_image12.png
    200
    400
    media_image12.png
    Greyscale

5. Sharma disclose the system of Claim 1, wherein the policies of the agents are each based on a different machine learning technique.

    PNG
    media_image13.png
    200
    400
    media_image13.png
    Greyscale


6. Sharma disclose the system of Claim 5, wherein the machine learning technique is based on one or more of: (i) a naive Bayesian model, (ii) a 3-layer neural network, (iii) deep learning (DL), (iv) an explorer, and (v) a human agent.

    PNG
    media_image13.png
    200
    400
    media_image13.png
    Greyscale


7. Sharma disclose a method using a plurality of agents (See e.g. Fig. 1), the method comprising: 
(i) assigning an initial state as a current state (See e.g. section VA on current world state.  See also section II on time step 1. Examiner Note: state at time step 1 is an initial state); 

    PNG
    media_image14.png
    200
    400
    media_image14.png
    Greyscale


    PNG
    media_image11.png
    200
    400
    media_image11.png
    Greyscale

(ii) providing to each of the agents the current state and selecting one of the tokens as a current token (See e.g. Fig. 1, section II on state; section V on current world state; section II on observation, section V on current world observation); 
(iii) receiving a current prediction from each agent, wherein each agent provides the current prediction based on a policy that acts on the current state and the current token  (See e.g. Fig. 1, section II on belief, section V on current belief); 
(iii) sending to each agent a current metric based on comparing that agent's current prediction against a predetermined intent (See e.g. Fig. 1, section II on reward against desired behavior); 
(iv) (a) determining a next state based on one or one or more of the current states and all the current tokens already provided to the agents (See e.g. section II on “At the next time step ,each agent receives a local observation of the environment”.  Examiner Note: that means the environment determined a next state and next observation (i.e. token) and provide that to the agents), (b) selecting a next token See e.g. section II on “At the next time step ,each agent receives a local observation of the environment”.  Examiner Note: that means the environment determined a next state and next observation (i.e. token) and provide that to the agents), (c) assigning the next state as the current state and the next token as the current token (See e.g. section II on “At the next time step ,each agent receives a local observation of the environment”.  Examiner Note: that means the environment determined a next state and provide that to the agents, the next state becomes current state in the next time step); and (d) repeating steps Iii) through (iv) (See e.g. Fig. 1 and section II on multiple time steps.  Examiner Note: that means the steps in Fig. 1 is repeated until ended (e.g. 10000 steps in Fig. 15)) ; and 
(v) causing each agent to modify its policy based on one or more of the current metrics sent to the agent *See e.g. Fig. 1, section I on using belief to approximate joint policy (i.e. modified policy) and executed by each agent).
While Sharma disclose reinforcement learning in general and disclose many real-world applications, Sharma fails to disclose apply RL to user intent application.  In particular, a method for training machine agents to a user intent expressed in a document, (i) extracting the tokens from the document, (iii) current prediction of the user intent, (iv) unless all extracted tokens have been selected.
However, Shah disclose reinforcement learning (thereby in same field of endeavor) and apply RF to user intent application domain (See e.g. section 3.2).  Shah further disclose a method for training machine agents to a user intent expressed in a document (See e.g. section 3.2 on training interactive RL from conversation logs (i.e. document), and that the training IRL to satisfy user goal/intent), 
(i) extracting the tokens from the document (See e.g. section 2.1 on dialogue act and slot value pairs.  Examiner note: each pair is considered a “token”),

    PNG
    media_image8.png
    200
    400
    media_image8.png
    Greyscale

(iii) current prediction of the user intent (See e.g. section 2.1 on dialogue state tracker that keep track of belief state of user’s goal based over the course of the dialogue),
(iv) unless all extracted tokens have been selected (See e.g. section 5.1 on successful dialogue.  Examiner Note: successful dialogue means end of dialogue (i.e. no more pairs/token).

    PNG
    media_image15.png
    200
    400
    media_image15.png
    Greyscale
 
As such, it would have been obvious to one having ordinary skill in the art, before the effective filing date of the claimed invention, to modify the reinforcement learning of Sharma to incorporate reinforcement learning of Shah with predictable result of applying RF to user intent application.
Given the advantage of faster learning speed of Shah, one having ordinary skill in the art would have been motivated to make this obvious modification.

    PNG
    media_image9.png
    200
    400
    media_image9.png
    Greyscale

Claims 8-12 are drawn to claims 2-6 and are rejected for the same reasons.

Pertinent Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Thomson et al (US 2018/0330721 A1) disclose digital assistant that use reinforcement learning to detect user intent, parse text token (See [0242]-[0249], [0264])
Andreas et al (US 2017/0140755 A1) disclose interaction assistant that use reinforcement learning to detect user intent, parse text token (See [0040], [0081]).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LUT WONG whose telephone number is (571)270-1123. The examiner can normally be reached M-F 10am-6pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Al Kawsar can be reached on 5712703169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LUT WONG/Primary Examiner, Art Unit 2127