DETAILED ACTION
This action is responsive to the Application filed on 04/18/2018 as amended on 09/21/2021. Claims 1-14, 16-21 are pending in the case. Claims 1, 19, 20 are the independent claims.
This action is non-final.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, or 365(c) is acknowledged.
It is noted that the prior-filed applications on which Applicant relies for priority (US serial nos. 62/487,404 and 62/487,418; each filed 04/19/2017) do not contain identical written descriptions and drawings as the instant application.  Accordingly, Applicant is requested to identify appropriate 35 USC 112(a) written description support in the priority applications for any claim amendments in order to maintain benefit of priority.
Examiner Note
This Office action is divided into two portions: a Requirement for Information and Consideration of Pending Claims.

REQUIREMENT FOR INFORMATION
Applicant and the assignee of this application are required under 37 CFR 1.105 to provide the following information that the examiner has determined is reasonably necessary to the examination of this application. Information made of record in this Office Action raises a question of whether Applicant, conceived, developed, and made public the invention presently claimed more than a year prior to the earliest effective filing date (i.e. 04/19/2017, provisional application nos. 62/487,404 and 62/487,418), and therefore the invention is not patentable under 35 USC 102(a)(1).
A person shall be entitled to a patent unless — (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
In response to this requirement, please agree or disagree to the stipulation of each of the following assertions of facts:
Applicant (AIBrain Corporation) has a publically-available website (aibrain.com) which was available to the public on 03/09/2016 which describes in broad terms the following products: agRe (Intelligent Agent Reasoner), iRSP (Intelligent Robot Software Platform), GCA (General Conversational Agent), Tyche Chatbot (an Android® app), and SMILE [See printouts retrieved via Internet Archive that were attached to the previous Office Action, 19 pages].
AgRe is described as having a perception and conversation interface, a learning engine, and as being capable of multi-agent problem-identification and problem-solving using MA-PDDL. Further The Tyche Chatbot was implemented using agRe (citations below are from previously-provided printouts, third page)
agRe is an intelligent agent reasoning engine for developers to build intelligent agents for a broad range of applications. 
Much of human problem solving requires holistic intelligence, including incremental episodic reasoning over time. An intelligent agent should be able to reason, beginning from sensing and perceiving the world, to understanding and responding to its environment. Among many hard problems, problem discovery by the agent is not trivial. It must discover its own problem situation based on its current life goals and beliefs, and then automatically generate an operational problem description. In real world situations, no one manually provides a robot with its task. If you want a robot to demonstrate truly human-like autonomous behavior, this problem must be an essential part of the AI reasoning process.
agRe listens and talks to humans. It understands what it hears and is able to generate speech.
agRe learns incrementally over time by interacting with people and the world. It learns not only new solutions but also acquires new problem solving skills.
Over time, agRe gets better at solving problems. In addition, agRe’s reactions to new situations improve. One major capability of AgRe is Multi-Agent Planning (MAP), which is a way of autonomous problem solving by/for multiple intelligent agents by interacting with humans in a world environment. It has the capability of autonomous problem discovery by interacting with humans, that is, automatic problem identification and program generation in MA-PDDL. The second phase includes autonomous problem solving by multiple agents such as automatic service delegation, reactive execution and monitoring, and goal-oriented physical and conversational behavior. The smartphone robot, Tyche, is the first application of MAP.
iRSP (Intelligent Robot Software Platform) is an intelligent robot building software for developers at all levels. The most important characteristics of IRSP are its capabilities of integration and support of intelligence (AIBrain website printouts previously provided, page 9) and Given a robot task, it will automatically generate a solution plan in PDDL (page 10).
Tyche is an AI robot companion for kids [that] talks, listens, thinks and reasons to plan and solve tasks which was publically available as a download from Google Play as of 02/15/2016 (from previously-provided printouts, page 11).

    PNG
    media_image1.png
    320
    561
    media_image1.png
    Greyscale
More information about Tyche could be obtained by visiting the “tyche.club” website:

    PNG
    media_image2.png
    132
    695
    media_image2.png
    Greyscale
The “tyche.club” website as available 03/01/2016 includes offers of sale for Tyche classic car and Tyche sports car models (retrieved via Internet Archive on [10/25/2021]. 3 pages) to be used with the Android ® app. The web site also includes the following short summary:
Tyche was initially debuted at International CES 2014 (The Guardian http://www.theguardian.com/technology/2014/jan/09/bizarre-robotics-ces-2014; Auto World News http://www.autoworldnews.com/articles/5753/20140109/5-must-see-robots-from-ces.htm ).
There was a kickstarter campaign which intended to bring Tyche to the public with a target delivery date of Nov 2015; note a project update dated 02/02/2016 (https://www.kickstarter.com/projects/407592806/tyche-the-true-ai-companion-for-kids retrieved via Internet archive on [10/25/2021]. 10 pages) indicated the project was done and the product was shipping. The current page for the kickstarter campaign includes a development timeline (page 23) and describes a programming tools to augment Tyche’s intelligence and capabilities) with varying levels of iRSP (pages 13-19).

    PNG
    media_image3.png
    1156
    1383
    media_image3.png
    Greyscale
Another product briefly described in the AIBrain printouts previously provided is the SMILE app (pages 13-19) with conversational intelligence and multiagent planning intelligence. A demonstration video was uploaded to YouTube (https://www.youtube.com/watch?v=ZbMUPgxDCso) on 09/13/2014 (see screenshot below).

Was Tyche, prior to 04/18/2016, using MAP and as implemented using agRe and iRSP, incapable of receiving one or more sensory inputs of an autonomous artificial intelligence computer character (e.g. listen to human by intelligent agent (Tyche),  perceives the visual world through seeing);
Was Tyche, prior to 04/18/2016, using MAP and as implemented using agRe and iRSP, incapable of using a computer processor to determine one or more beliefs of the autonomous artificial intelligence computer character (e.g. determining information about the world environment);
Was Tyche, prior to 04/18/2016, using MAP and as implemented using agRe and iRSP, incapable of identifying one or more goals of the autonomous artificial intelligence computer character (e.g. identify a problem to be solved);
Was Tyche, prior to 04/18/2016, using MAP and as implemented using agRe and iRSP, incapable of using a machine learning model to automatically determine an action of the autonomous artificial intelligence computer character based at least in part on the one or more sensory inputs, the one or more beliefs, and the one or more goals of the autonomous artificial intelligence computer character (e.g. one or more planned actions to solve the problem);
Was Tyche, prior to 04/18/2016, using MAP and as implemented using agRe and iRSP, incapable of training the machine learning model to create a solution plan for the autonomous artificial intelligence computer character using a deep convolutional neural network (e.g. incrementally learning how to discover and solve new problems);
Was Tyche, prior to 04/18/2016, using MAP and as implemented using agRe and iRSP, incapable of causing the autonomous artificial intelligence computer character to execute the determined action
If Tyche prior to 04/18/2016, was incapable of any of the above features, on what date was the Tyche product made capable of the missing feature?
Which of the following AIBrain products was SMILE implemented using: MAP, agRe, iRSP, GCA?
Which, if any, of the following features was SMILE, which implements an  autonomous artificial intelligence computer character (TESS), incapable of providing when released in 2014: receiving one or more sensory inputs; determine one or more beliefs; identifying one or more goals; determine an action [to be performed] using a machine learning model; and executing the determined action.
In response to this requirement, please provide the following documentation in support of the responses to the interrogatories eliciting factual information:
Design documentation, user’s guides, and promotional materials for the Tyche product which was released to the public prior to 04/18/2016.
Design documentation, user’s guides, and promotional materials for the first Tyche product which was released to the public after to 04/18/2016 in which the Tyche product was capable of the missing feature. 
Documentation, user’s guides, and promotional materials for the agRe engine released prior to 04/18/2016.
Documentation, user’s guides, and promotional materials for iRSP released prior to 04/18/2016.
Documentation, user’s guides, and promotional materials for GCA released prior to 04/18/2016.
Documentation, user’s guides, and promotional materials for SMILE released prior to 04/18/2016.

The fee and certification requirements of 37 CFR 1.97 are waived for those documents submitted in reply to this requirement. This waiver extends only to those documents within the scope of the requirement under 37 CFR 1.105 that are included in the applicant's first complete communication responding to this requirement. Any supplemental replies subsequent to the first communication responding to this requirement and any information disclosures beyond the scope of this requirement under 37 CFR 1.105 are subject to the fee and certification requirements of 37 CFR 1.97 where appropriate. See MPEP 704.14(a).
The applicant is reminded that the reply to this requirement must be made with candor and good faith under 37 CFR 1.56. Where the applicant does not have or cannot readily obtain an item of required information, a statement that the item is unknown or cannot be readily obtained may be accepted as a complete reply to the requirement for that item. See MPEP 704.14(a).
The Technology Center Director has approved this require by signing below:
/DAVID A WILEY/Director, Art Unit 2100                                                                                                                                                                                                        

CONSIDERATION of PENDING CLAIMS
Response to Applicant’s Amendment
In Applicant’s response filed 09/21/2021 Applicant amended claims 1, 16, 19, and 20; canceled claim 15; added claim 21; amended the specification; and argued against the rejections/objections of the previous Office action mailed 06/09/2021.
In the previous Office action, Examiner had indicated the subject matter of now-canceled dependent claim 15 as allowable. New art has been identified which may be relied upon to teach this element.  Accordingly, this action is made non-final, even though the full scope of the independent claim has changed, so that Applicant may have the opportunity to review and respond accordingly.
Applicant’s amendments to the claims are acknowledged. 
In response to Applicant’s amendment to the independent claims, the previous rejection under 35 USC 101 is respectfully withdrawn.
Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1, 19, 20 are rejected under 35 U.S.C. 103 as unpatentable over GRAEPEL et al (US 20050245303 A1, provided on previous IDS) in view of ZHAO et al. (Deep Reinforcement Learning with Experience Replay Based on SARSA. 978-1-5090-4240-1/16/$31.00 ©2016 IEEE. Retrieved via IEE Explore on [07/01/2021]. 6 pages, newly cited).
Regarding claim 1, GRAEPEL teaches the method comprising: 
receiving one or more sensory inputs of an autonomous artificial intelligence computer character (Pg.2, [0022] EN: this denotes the in game character having sense aspects of their environment and react to them accordingly. This is an artificial intelligence computer character because the 
using a computer processor (Pg.4, particularly paragraph 0050; EN: this denotes the use of a computer processor) to determine one or more beliefs of the autonomous artificial intelligence computer character (Pg.3, particularly paragraphs 0036; EN: this is an example of the character learning. An action in a specific situation yielded a positive result, and now the character “believes” that this is a good move to make in that situation in the future. This is a belief as it represents what the AI character knows/believes about certain situations);
identifying one or more goals of the autonomous artificial intelligence computer character (Pg.2, [0023] EN: this denotes the character having goals);
using a machine learning model (Pg.4, [0042-0044] EN: this denotes the use of a neural network to perform the actions of the character) to automatically determine an action of the autonomous artificial intelligence computer character based at least in part on the one or more sensory inputs (Pg.3, [0034] EN: this denotes the character detecting the state of 3 ft/stand), the one or more beliefs (Pg.3 [0034]; EN: this denotes examining possible solutions, and picking the best one based on the highest reward value of the action based on the character’s experience), and the one or more goals of the autonomous artificial intelligence computer character (Pg.3, [0035] EN: this denotes the actions depending on the goals of the character);
training the machine learning model (Pg.2 [0027] EN: this denotes the character being “trained” via learning. Pg.4, [0044]; EN: this denotes the neural network being updated/adapting to agents experience via changing the weights of the neural network. This is the training described in [0027]) to create a solution plan for the autonomous artificial intelligence computer character (Pg.3, [0034]; EN: this denotes examining possible solutions, and picking the best one based on the highest reward value of the action based on the character’s experience. This is a solution as it denotes the best action to take for the character to achieve their goal in that particular situation) using aneural network
causing the autonomous artificial intelligence computer character to execute the determined action ([0036-0037] EN: describing a sequence of actions, e.g. previously selected “throw” action was performed which led to next selected action “kick” to be executed in fighting sequence).
GRAEPEL is silent with respect to using a deep convolutional neural network to perform the training.
However, ZHAO discloses training using a deep convolutional neural network (Abstract; EN; this denotes the use of deep convolutional neural networks for reward based learning in video games).
GRAEPEL and ZHAO are analogous art because both involve video game learning.
At the time the invention was effectively filed it would have been obvious to one skilled in the art of video game learning to combine the work of GRAEPEL and ZHAO in order to make use of deep convolutional neural networks when using neural networks for learning. The motivation for doing so would be to “combine excellent perceiving ability of DL with decision making ability of RL” (ZHAO, Pg.1, C1, second paragraph) or in the case of GRAEPEL, make use of the deep convolutional neural network of ZHAO to make the reward based learning decisions of the GRAEPEL reference.
Regarding claim 19, GRAEPEL in view of ZHAO, combined at least for the reasons discussed above, similarly teaches the system comprising: a processor (GRAEPEL [0050] processor of computer 20); and a memory coupled with the processor (GRAEPEL [0051] system memory of computer 20), wherein the memory is configured to provide the processor with instructions which when executed  (GRAEPEL [0053] program modules such as those in [0056] to implement [0057]) cause the processor to perform the method operations of claim 1, rejected under similar rationale in view of the combination.
Regarding claim 20, GRAEPEL in view of ZHAO, combined at least for the reasons discussed above, similarly teaches the non-transitory computer readable storage medium (GRAEPEL [0051] system memory of computer 20) storing a computer program product comprising computer instructions (GRAEPEL [0053] program 
Claims 2, 3 are rejected under 35 USC 103 as unpatentable over GRAEPEL in view of ZHAO, further in view of ABRAMS et al. (Pub. No.: US 2018/0165596 A1, previously cited).
Regarding dependent claim 2, incorporating the rejection of claim 1, while GRAEPEL “senses” the environment, GRAEPEL does not explicitly describe the mechanisms for “sensing” (as claimed, wherein the received one or more sensory inputs include visual and auditory inputs). 
ABRAMS is similarly directed to modeling characters which can interact with users. In FIG 1, data sources 150 are provided to a computing instance 110 which is executing character engine 140, as well as information from various user platforms 120. [0023] To generate character responses, the character engine 140 selects and applies inference algorithm (s) and personality engine(s) based on the current context and data received from the data sources 150. Finally, the character engine 140 tailors the character responses to the capabilities of the user platform 120. [0024] The data sources 150 may include any type of devices and may transmit data to the character engine 140 in any technically feasible fashion {such as} a gamification platform.
FIG 2, [0028] shows more detail of character engine 140 including various training and inference engines. Of particular note, [0029] In operation, the input platform abstraction infrastructure 210 receives input data from any number and types of the user platforms 120. For instance, in some embodiments, the input platform abstraction infrastructure 210 may receive text, voice, accelerometer, and video data from the smartphone 122.
The input data is analyzed ([0033] converted into recognition data) to [0034] determine the user's intent. The recognition data and the determined user's intent are transmitted to [0036-0037] domain parser for additional analysis. [004 7] The domain parser 245 transmits the recognition data 225, the user intent 235, and the assessment domain 245 to the inference engine 250 for further analysis and [0049] refinement of context. [0050] The inference engine 250 evaluates the assessment domain 245, the user intent 235, and/or the recognition data 225 in conjunction with data obtained from the data sources 150 to determine character responses 285.
receiving one or more sensory inputs including visual and auditory inputs which may be used by a machine learning model (an inference engine) to determine what action an NPC may take.
Accordingly, it would have been obvious to one having ordinary skill in computer gaming before the effective filling date of the claimed invention, having the teachings of GRAEPEL in view of ZHAO and ABRAMS before them, to have combined GRAEPEL in view of ZHAO and ABRAMS and arrived at the claimed invention, motivated by the teaching in GRAEPEL that the NPC senses the environment without specific examples of perception, where ABRAMS is a teaching example of an NPC (agent) perceiving its environment by relying on many different sensor inputs from the user's device, with expected and predictable results, thus increasing the range/types of information which may be used to determine the next action.
Regarding dependent claim 3, incorporating the rejection of claim 1, GRAEPEL does not appear to expressly disclose wherein the received one or more sensory inputs are based on a current position of the autonomous artificial intelligence computer character. Incorporating the teachings of ABRAMS as discussed above, an NPC may receive information about its environment from the sensors of a user device. This sensor information includes ABRAMS [0031] positioning data (e.g. Global Positioning System data), thus when considered in reasonable combination, teaching wherein the received one or more sensory inputs are based on a current position of the autonomous artificial intelligence computer character.
Claims 4-6, 13-14, and 17-18 are rejected under 35 USC 103 as unpatentable over GRAEPEL in view of ZHAO, further in view of HADAD-SEGEV, Meirav (US 2017/0041183 A1, previously cited).
Regarding dependent claim 4, incorporating the rejection of claim 1, while GRAEPEL teaches an NPC “senses” the environment and has “beliefs” about the environment, nonetheless GRAEPEL does not explicitly disclose wherein the received one or more sensory inputs are based on a sensory detection property belief of the autonomous artificial intelligence computer character. At best, the adaptive agent operates despite [0022] uncertainty about the environment.
In many conventional systems, software agents (bots) cannot contribute to a network without having social intelligence and social skills, thus the contribution of HADAD-SEGEV is to provide a framework for software agents (bots) to have social intelligence and social skills so that they may contribute to a social network just like any human participant. [0073] provides a number of examples of software agents including non-player characters (NPC) such as would be found in a game, thus HADAD-SEGEV is analogous art, particularly as the ASM (agent)’s social intelligence [0086] can typically include the ability of an agent to correctly capture social interpersonal relationships in a social group, and use these to navigate and achieve its set of desires. [0087] the social processes (or procedures) of the ASM component may enable the agent to attribute mental states to itself and other agent's in its social groups and to understand that others have mental states that differ from its own. [0097] in gaming applications the ASM component enables a non-player character (NPC) to autonomously socialize with other NPCs and or with human controlled characters playing the game to enable social interaction between NPCs in games with each other and with human players.
HADAD-SEGEV teaches one or more sensory inputs ([0071] agents perceive environment through sensors; [0095] as well as learning from other agents via socializing) are based on a sensory detection property belief ([0088] Beliefs refer to what the agent knows [0095] agent may then be able to infer other agents' beliefs, desires, intentions, and so on, and use this information to interpret the other agents' behaviors and/or predict what the other agents may do next) of an autonomous artificial intelligence computer character (e.g. NPC represented by agent and implemented using ASM; [0105] social interaction and reasoning process can include updating beliefs of the agent… dynamic beliefs can include dynamic social beliefs and dynamic environmental beliefs related to environment states of the agent [0106] dynamic beliefs can include individual beliefs referring to input regarding the agent and mutual beliefs referring to input regarding at least one agent associated with the at least one second ASM). These beliefs are used in [0107] establishing a desire which can then [0108] can be acted upon during [0110] an execution phase.
Accordingly, it would have been obvious to one having ordinary skill in computer gaming before the effective filling date of the claimed invention, having the teachings of GRAEPEL in view of ZHAO and HADAD-SEGEV before them, to have combined 
Regarding dependent claim 5, incorporating the rejection of claim 1, while GRAEPEL teaches the received one or more sensory inputs (the [0022] information (state) of the environment that is sensed by the adaptive agent) and an interaction with a second autonomous artificial intelligence computer character ([0020] each character is designated as a computer-controlled adaptive agent [0021] each character selects an action from its action list based at least on the current game state [0023] in a typical one-on-one, hand-to-hand combat game, a goal may be defined for the adaptive agent 202  as "inflicting a maximum amount of damage on an opponent while suffering a minimum amount of damage itself"; it is not clearly stated that the received one or more sensory inputs are based on the interaction.
Incorporating the concept of dynamic beliefs as taught by HADAD-SEGEV [0104-0106] and as discussed in the rejection of claim 4 cures any deficiency because the AI-based NPC of HADAD-SEGEV learns about the environment by interacting with other AI-based NPCs.
Accordingly, it would have been obvious to one having ordinary skill in computer gaming before the effective filling date of the claimed invention, having the teachings of GRAEPEL in view of ZHAO and HADAD-SEGEV before them, to have combined GRAEPEL in view of ZHAO (teaching autonomous artificial intelligence computer character (AI-NPC) that learns about its environment and determines best actions to achieve a goal) and HADAD-SEGEV (teaching a similar AI-NPC which interacts with other AI-NPCs and has beliefs about what the other AI-NPCs know/believe about the game) and arrived at the claimed invention by including the dynamic beliefs of HADAD-SEGEV as further input to GRAEPEL in order to determine a best next-action to perform in order to accomplish some goal. The improvement is motivated by HADAD-SEGEV 
Regarding dependent claim 6, incorporating the rejection of claim 1, GRAEPEL does not appear to expressly disclose wherein the received one or more sensory inputs are based on a conversation.
Incorporating the concept of dynamic beliefs as taught by HADAD-SEGEV [0104-0106] and as discussed in the rejection of claim 4 cures any deficiency because the AI-based NPC of HADAD-SEGEV learns about the environment by interacting with other AI-based NPCs, and in particular by [0097] communicating with the other AI-based NPCs (or humans) using conversation as part of the social interaction skills (see [0166-0167] for discussion of how communication can occur between agents).
Accordingly, it would have been obvious to one having ordinary skill in computer gaming before the effective filling date of the claimed invention, having the teachings of GRAEPEL in view of ZHAO and HADAD-SEGEV before them, to have combined GRAEPEL in view of ZHAO (teaching autonomous artificial intelligence computer character (AI-NPC) that learns about its environment and determines best actions to achieve a goal) and HADAD-SEGEV (teaching a similar AI-NPC which interacts with other AI-NPCs and has beliefs about what the other AI-NPCs know/believe about the game) and arrived at the claimed invention by including the dynamic beliefs of HADAD-SEGEV as further input to GRAEPEL in order to determine a best next-action to perform in order to accomplish some goal. The improvement is motivated by HADAD-SEGEV [0069] in order to improve the interactions of bots (AI agents/NPCs) in a social network by giving them social intelligence and social skills.
Regarding dependent claim 13, incorporating the rejection of claim 1, GRAEPEL does not appear to expressly disclose storing the determined action in a solution plan repository (interpreted as previously storing an action in order to determine it can be used in the future; note that there is no requirement in the claim that the determined action be some newly-discovered action never previously stored, only that it be stored at some point).
In addition to the teachings previously discussed, HADAD SEGEV teaches storing the determined action in a solution plan repository ([0116] data services can further 
Accordingly, it would have been obvious to one having ordinary skill in computer gaming before the effective filling date of the claimed invention, having the teachings of GRAEPEL in view of ZHAO and HADAD-SEGEV before them, to have combined GRAEPEL in view of ZHAO (teaching autonomous artificial intelligence computer character (AI-NPC) that learns about its environment and determines best actions to achieve a goal) and HADAD-SEGEV (teaching a similar AI-NPC which interacts with other AI-NPCs and has beliefs about what the other AI-NPCs know/believe about the game, as well as providing storage of actions to be performed using data services) and arrived at the claimed invention by including the dynamic beliefs of HADAD-SEGEV as further input to GRAEPEL in order to determine a best next-action to perform in order to accomplish some goal. The improvement is motivated by HADAD-SEGEV [0069] in order to improve the interactions of bots (AI agents/NPCs) in a social network by giving them social intelligence and social skills.
Regarding dependent claim 14, incorporating the rejection of claim 1, GRAEPEL does not appear to expressly disclose wherein at least one of the one or more determined beliefs is based on an inference rule or a forward chaining result (in the alternative, only one needs to be shown in the art).
HADAD SEGEV, in addition to the teachings previously discussed, further teaches wherein at least one of the one or more determined beliefs is based on an inference rule or a forward chaining result (see [0177] beliefs can include a set of rules relates to the environment of the agent 51 which allows the ASM 60 to determine ways to achieve the desire of the agent. See also [0095] agent may then be able to infer other agents' beliefs, desires, intentions, and so on, and use this information to interpret the other agents' behaviors and/or predict what the other agents may do next).
Accordingly, it would have been obvious to one having ordinary skill in computer gaming before the effective filling date of the claimed invention, having the teachings of GRAEPEL in view of ZHAO and HADAD-SEGEV before them, to have combined GRAEPEL in view of ZHAO (teaching autonomous artificial intelligence computer character (AI-NPC) that learns about its environment and determines best actions to 
Regarding dependent claims 17 and 18, incorporating the rejection of claim 1, GRAEPEL in view of ZHAO does not appear to expressly disclose wherein the one or more beliefs are based on an internal information and an external information, wherein the internal information can be changed by the external information (as described in provisional application 62487418 [0036] ... external information is called percept that include visual percept and hearing percept and is related to the environment of the game; [0037] Internal information ... elements that are stored in information store and used as property of character; see instant application as filed [0187-0188] and [0192-0193], where it is improper to incorporate any specific interpretation from the disclosure; HADAD SEGEV broadly teaches external information (e.g. information learned from other agents) which may be used to update the agent’s beliefs; as well as internal information (e.g. any stored information) which could include a stored belief before the external information was obtained during the update beliefs 110 process of FIG 9, before desires are established 120, in order to determine the action 130 to be executed 140.
Accordingly, it would have been obvious to one having ordinary skill in computer gaming before the effective filling date of the claimed invention, having the teachings of GRAEPEL in view of ZHAO and HADAD-SEGEV before them, to have combined GRAEPEL in view of ZHAO (teaching autonomous artificial intelligence computer character (AI-NPC) that learns about its environment and determines best actions to achieve a goal) and HADAD-SEGEV (teaching a similar AI-NPC which interacts with other AI-NPCs and has beliefs about what the other AI-NPCs know/believe about the game) and arrived at the claimed invention by including the dynamic beliefs of HADAD-SEGEV as further input to GRAEPEL in order to determine a best next-action to perform in order to accomplish some goal. The improvement is motivated by HADAD-SEGEV [0069] in order to improve the interactions of bots (AI agents/NPCs) in a social network by giving them social intelligence and social skills
Claims 7-10 are rejected under 35 USC 103 as unpatentable over GRAEPEL in view of ZHAO, further in view of HADAD-SEGEV, further in view of ADAMS.
Regarding dependent claim 7, incorporating the rejection of claim 6, GRAEPEL in view of ZHAO, further in view of HADAD-SEGEV, combined at least for the reasons discussed above, suggests without expressly disclosing wherein the conversation includes a natural language conversation under the assumption that the AI-based character is communicating with a human agent.
ABRAMS teaches mechanisms (FIG 2) for how a human user 120 can communicate with an NPC via the sensor processing infrastructure 220 of the character engine 140. Of particular note, [0033] explains that the recognition data 225 may include speech recognition data, semantic understanding (e.g. Natural Language) data, in addition to other data.
Accordingly, it would have been obvious to one having ordinary skill in computer gaming before the effective filling date of the claimed invention, having the teachings of GRAEPEL in view of ZHAO, further in view of HADAD-SEGEV and ABRAMS before them, to have used natural language processing taught in ABRAMS in order for the AI-based character of GRAEPEL in view of ZHAO, further in view of HADAD-SEGEV to learn (establish beliefs about the environment) from the user, with a reasonable expectation of success, the combination motivated by the suggestion in HADAD-SEGEV (need to improve interactions), where ABRAMS explains a number of mechanisms for understanding the input provided by the user.
Regarding dependent claim 8, incorporating the rejection of claim 7, GRAEPEL in view of ZHAO, further in view of HADAD-SEGEV further in view of ABRAMS, combined at least for the reasons discussed above, further teaches wherein the natural language conversation is generated at least in part by using a conversational artificial intelligence server (e.g. the portion of the conversation by the AI-based NPC is generated by, for example ABRAMS [0062] the character engine 250 models a chatbot (i.e., an Al-based chat application) ... The inference engine 250 could then select the personality engine 280(2) and apply the personality engine 280(2) to the inference to determine a content of a verbal response that is consistent with the personality engine 280(2) ).
Regarding dependent claim 9, incorporating the rejection of claim 8, GRAEPEL in view of ZHAO, further in view of HADAD-SEGEV further in view of ABRAMS, combined at least for the reasons discussed above, further teaches wherein the conversational artificial intelligence server saves the natural language conversation in a memory store (ABRAMS [0069] inference engine 250 interfaces with the knowledge subsystem 260 in multiple modes. In a real-time mode (i.e., while interacting with the user), the inference engine 250 may interface with the knowledge subsystem 260 as part of determining the inferences and/or the character responses 285. [0073] the inference engine 250 may transmit data associated with the current interaction to the knowledge subsystem 260 for storage in the knowledge database 266).
Regarding dependent claim 10, incorporating the rejection of claim 9, GRAEPEL in view of ZHAO, further in view of HADAD-SEGEV further in view of ABRAMS, combined at least for the reasons discussed above, further teaches wherein the memory store is an individual memory (interpreted as each NPC in system has their own storage for all information, including conversations; taught by the combination, where HADAD SEGEV has [0154] each NPC (agent) having their own ASM, while ABRAMS explains the implementation for at least one agent as discussed above. Note that it would not make sense for multiple chatbots to commonly store all conversations, beliefs, and inferences unless (a) there was a mechanism for an individual conversation to be pulled from the common store and learned from separately or (b) all chatbots are intended to be all-knowing (and therefore not have individual learning or individual interactions) ).
Claims 11-12 are rejected under 35 USC 103 as unpatentable over GRAEPEL in view of ZHAO, further in view of HADAD-SEGEV further in view of ABRAMS, further in view of WILKS et al. (Demonstration of a prototype for a Conversational Companion for reminiscing about images. Proceedings of the ACL 2010 System Demonstrations, pages 72-77, Uppsala, Sweden, 13 July 2010. pp. 72-77; previously cited).
Regarding dependent claims 11 and 12, each incorporating the rejection of claim 9, while the combination of GRAEPEL, HADAD SEGEV and ABRAMS, combined for the reasons discussed above, teach individual knowledge bases (memory stores) for storing conversational information, the combination may not be relied upon to expressly disclose the memory store is a triple database or the natural language conversation is saved using a triple format (as described in the instant application, [00108-00109] 
For example, WILKS is broadly directed to a conversational system to elicit facts from a person and make inferences based on previously-learned information. Per the abstract:
a platform for novel approaches to the following: 1) The use of Information Extraction (IE) techniques to extract the content of incoming dialogue utterances after an Automatic Speech Recognition (ASR) phase, 2) The conversion of the input to Resource Descriptor Format CRDF/ to allow the generation of new facts from existing ones, under the control of a Dialogue Manger (OM), ... 3) A OM implemented as a stack and network virtual machine that models mixed initiative in dialogue control, and 4) A tuned dialogue act detector based on corpus evidence
WILKS §3 describes the architecture of the system, while §4 explains dialog (conversation) understanding and inference including storing information in a knowledge base using triples (see particularly page 71 col 1 ¶ 3).
Accordingly, it would have been obvious to one having ordinary skill in computer art before the effective filling date of the claimed invention, having the teachings of GRAEPEL in view of ZHAO, further in view of HADAD-SEGEV further in view of ABRAMS and WILKS before them, to have combined GRAEPEL in view of ZHAO, further in view of HADAD-SEGEV further in view of ABRAMS (storing conversation information in a knowledge base for use by an AI-based NPC) and WILKS (storing conversation information in a knowledge base, where the knowledge is organized using triples of subject-predicate-object) and arrived at the memory store is a triple database or the natural language conversation is saved using a triple format with a reasonable expectation of success, the combination motivated by using a known technology (storing learned conversation information using RDF triples in a knowledge base) for a known reason (in order to make inferences about the stored information); which is a goal of both GRAEPEL (learn about environment in order to take an action), HADAD SEGEV (to learn/make inferences from social interactions with other NPC agents) and ABRAMS (to learn/make inferences from communication with a user).
Claims 16 and 21 are rejected under 35 USC 103 as unpatentable over GRAEPEL in view of ZHAO, further in view of GUO et al. (Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree search planning. Part of Advances in Neural Information Processing Systems 27 (NIPS 2014). 9 pages; newly cited).
Regarding dependent claim 16 (21), incorporating the rejection of claim 1 (19), GRAEPEL does not appear to explicitly disclose wherein the machine learning model is trained using an automated planner.
GUO discloses wherein the machine learning model is trained using an automated planner (Pg.2, particularly the second paragraph; EN: this denotes using Monte Carlo tree search planning methods (i.e. a type of AI planning/Automated planning) to generate training data for a deep learning classifier).
GUO and Graepel modified by Zhao are analogous art because both involve deep learning. 
At the time of invention it would have been obvious to one skilled in the art of deep learning to combine the work of GUO and GRAEPEL modified by ZHAO in order to make use of planning when training a deep neural network for video gameplay. The motivation for doing so would be to combine reinforcement learning and “provide the best machine-agent real-time game play to date (in some games close to or better than human-level play)” (GUO, Pg.2, second paragraph) or in the case of GRAEPEL modified by ZHAO, allow this type of learning to be used for the neural network of the GRAEPEL reference.
Therefore at the time of invention it would have been obvious to one skilled in the art of deep learning to combine the work of GUO and GRAEPEL modified by ZHAO in order to make use of planning when training a deep neural network for video gameplay.



It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way. “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned. They are part of the literature of the art, relevant for all they contain.” In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968)). Further, a reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill the art, including nonpreferred embodiments. Merck & Co. v. Biocraft Laboratories, 874 F.2d 804, 10 USPQ2d 1843 (Fed. Cir.), cert. denied, 493 U.S. 975 (1989). See also Upsher-Smith Labs. v. Pamlab, LLC, 412 F.3d 1319, 1323, 75 USPQ2d 1213, 1215 (Fed. Cir. 2005); Celeritas Technologies Ltd. v. Rockwell International Corp., 150 F.3d 1354, 1361, 47 USPQ2d 1516, 1522-23 (Fed. Cir. 1998).


CONCLUSION
The prior art made of record is considered pertinent to applicant’s disclosure and is recorded on Form PTO-892. Applicant is required under 37 C.F.R. § 1.111(c) to consider these references fully when responding to this action.
US 20070260567 A1 Providing dynamic learning for software agents in a simulation. Software agents with learners are capable of learning from examples. When a non-player character queries the learner, it can provide a next action similar to the player character.
US 10607134 B1 devices, systems, methods, and/or applications for learning an avatar's or an application's operation in various circumstances, storing this knowledge in a knowledgebase (i.e. neural network, graph, sequences, etc.), and/or enabling autonomous operation of the avatar or the application.
US 9443352 B1 In addition to avatars of the participants, the virtual world may include virtual characters or objects that are computer generated and controlled using artificial intelligence algorithms to interact with other virtual characters, avatars, or objects. In one example, the virtual characters may not have human counterparts that drive the virtual characters, rather, the virtual characters may be driven independent of human counterparts and by artificial intelligence algorithms as discussed above. In other words, a virtual character is one that does not have a living human counterpart that is active in the simulation during the simulation. 

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, RENEE CHAVEZ can be reached on 571-270-1104.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Amy M Levy/Primary Examiner, Art Unit 2179