DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 21 January 2021 has been entered.

Response to Arguments
Applicant's arguments filed 21 January 2021 have been fully considered but they are not persuasive.

Applicant alleges:
Bennett using context specific grammars and dictionaries may correspond to analyzing a user input (which the Applicant does not acknowledge given the context of the claim) but Bennett does not disclose that an answer is generated based on the analyzed user input and one or more features included in a visual output provided by the executing content when the audio signal is received.
The Office Action alleges that Bennett discloses generating an answer based on the analyzed user input and a visual output provided by the executing content because Bennett generates an answer based upon the user interaction and Bennett using context-specific grammars and 

Examiner respectfully disagrees.  Bennet teaches configuring the agent for a particular application, for example a learning application the agent has the appearance of a professor, and other visual props like a blackboard and text book are used and presented to the user; para 204.  Bennet further details that various options are gleaned from the user’s status within a particular application are received, for instance, in the context of a remote learning system, “chapter” data items are communicated; para 233; and the grammar and dictionary files are loaded dynamically according to the given course, chapter and/or section automatically by an application program executed by the user; para 234.  Bennet details an example in Fig. 12 of a chapter selection through a “displayed” user interface.  The character [professor] directs the student to make selections… and then once the selections are made [e.g. features in the visual output], the student is prompted to ask a query; see para 311 for a full description.
Considering these examples, Examiner submits that Bennet includes a visual element that depicts a classroom in the learning environment, in which visual elements such as textbooks, courses, chapters, professors/instructors are present and interactive, the user interacts and makes selections, for example of chapters, though the interface, and then the system automatically and dynamically loads grammar and dictionaries 
Examiner maintains that these elements anticipate the claimed “generating an answer based on the analyzed user input and one or more features included in the visual output provided by the executing content when the audio signal is received using the voice recognition apparatus.”

Applicant further alleges:
As discussed above, Bennett discloses that context specific grammars and dictionaries are used to analyze user questions lexically. However, Bennett does not disclose that one or more features included the visual output provided by the executing content when an audio signal is received is used in selection of the specific grammars and dictionaries. As an example, Bennett does not disclose that the specific grammars and dictionaries are selected or limited based on an agent having a visual form and visual props in Bennett when an audio signal is received. The various elements of display including an agent having a visual form, visual props in Bennett are used to present a generated answer but are not used to generate the answer, as recited in claim 1.
For example, Bennett teaches that the answer is presented to the user, as in the case of a live teacher, in an articulated manner by an agent that mimics the mentor or teacher, and in the language of choice—English, French, German, Japanese or other natural spoken language (see paragraphs [0096], [0128] and [0129]). See also paragraphs [0194]-[0204] explaining features of the MS Agent 220B which is responsible for coordinating and handling the actions of the animated agent 157. Thus, the agent is used to present a generated answer. However, Bennett does not disclose that the answer is generated based on one or more features included in the agent when the audio signal is received.
As pointed out in the Office Acton, paragraphs [0233] and [0234] of Bennett also teaches that specific grammars are dynamically loaded or actively configured as the current grammar according to the user's context, i.e., as in the case of a remote learning system, the Course, Chapter and/or Section selected. However, Bennett does not disclose that the specific grammars used in generating the answer are selected based on one or more features included in the visual output provided by the executing content when the audio signal is received. Generating an answer based on the analyzed user input and one or more features 

Examiner respectfully disagrees.  As noted above, Bennet details in the context of a remote learning system, “chapter” data items are communicated; para 233; and the grammar and dictionary files are loaded dynamically according to the given course, chapter and/or section automatically by an application program executed by the user; para 234; and an example where the character [professor] directs the student to make selections… and then once the selections are made [e.g. features in the visual output], the student is prompted to ask a query; see para 311 for a full description.
In these teachings, the one or more features included in the visual output are met by the textbook/chapter/course/section elements displayed that the user interacts with and the system loads dictionaries based on these contexts to assist in answering the queries, see also Figs. 7A-D.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1 – 20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Bennett (hereinafter Ben, U.S. Patent Application Publication 2010/0235341).

Regarding Claim 1, Ben discloses:
An operation method of a voice recognition apparatus, the method comprising:
executing content providing a visual output by the voice recognition apparatus (e.g. a remote learning application, which includes various elements of display including an agent having a visual form, visual props; para 204; see an example in Fig. 12 and para 311 where a character [professor] directs the student to make selections through a displayed interface [course/chapter/section etc]… and then once the selections are made, the student is prompted to ask a query; see para 311)
receiving an audio signal during execution of the content (e.g. receive user’s speech routine; para 207, note query consists of a question directed to a particular topic, such as “what is a network” in the context of a remote learning application; para 207; real-time performance of a query/response; para 210; note speech input is provided in the form of a question or query articulated by the speaker at the client’s machine; para 128; and further note in learning systems, selected options are based on the context experienced by a user during an interactive process; para 233; and determined automatically by an application program executed by the user; para 234);
performing, using the voice recognition apparatus, voice recognition on the audio signal (e.g. see operation of speech recognition components; paras 230+; and further 
acquiring content information including context of the content being executed by the voice recognition apparatus (e.g. gleaned from the user’s status within a particular application; para 233; current grammar according to the user’s context are loaded dynamically, for example the grammar and dictionary files are loaded according to the given course, chapter and/or section as dictated by the users or determined automatically; para 234; note data options are communicated and loaded based on the context experienced by the user during an interactive processes; para 233);
analyzing a user input based on the content information from a voice recognized by performing the voice recognition (e.g. text-to-query convertor formulates a suitable query; para 128; and to make the speech recognition process more reliable, context-specific grammars and dictionaries are used to analyzer user questions lexically; para 129)
generating an answer (e.g. database processor locates and retrieves an appropriate answer; para 128) based on the analyzed user input (e.g. “query”) and one or more features included in the visual output provided by the executing content (e.g. character [professor] directs the student to make selections… and then once the selections are made, the student is prompted to ask a query; see para 311; note further the “context” of a remote learning application [visual output]; para 207, note in particular visual props used by the agent, for example, a text book, blackboard, professor etc; para 204) when the audio signal is received using the voice recognition apparatus  (e.g. answer retrieved based on the query; para 128; a query [user input] may consist of a 
outputting the answer (e.g. answer provided to a user; para 129; answer is converted into speech by text and expressed as oral feedback by animated character agent 157; para 128).

Regarding Claim 2, in addition to the elements stated above regarding claim 1, Ben further discloses:
wherein the analyzing the user input based on the content information from the voice recognized by performing the voice recognition comprises performing natural language understanding of the recognized voice based on the content information (e.g. natural language engine facilities structuring the query to database 188; para 128 and 

Regarding Claim 3, in addition to the elements stated above regarding claim 2, Ben further discloses:
wherein the performing the natural language understanding of the recognized voice based on the content information comprises:
performing the natural language understanding with respect to the recognized voice (e.g. the NLP system is charged with parsing, understanding and indexing the linguistic unit; para 132); ; and
correcting the natural language understanding with respect to the recognized voice based on the content information (e.g. verifying the disambiguated set of questions from the random questions at step 2050 using the WordNet semantic decoding method above; para 360; note WordNet derived metrics are used to enhance the accuracy of the NLQS [natural language query system] statistical algorithms; para 331).

Regarding Claim 4, in addition to the elements stated above regarding claim 1, Ben further discloses:


Regarding Claim 5, in addition to the elements stated above regarding claim 1, Ben further discloses:
wherein the generating the answer based on the analyzed user input and the visual output comprises:
determining a relevance of the answer (e.g. looking for a reasonably close question/answer pair, “if” it reaches a certain confidence level or not; para 306); and
correcting the answer when it is determined that the answer is not appropriate (e.g. if it doesn’t reach a certain confidence level, increasing the scope so that the query would be presented to one or more different NLEs across a number of servers to improve the likelihood of finding an appropriate matching question/answer pair; para 306; also consider verifying the disambiguated set of questions from the random questions at step 2050 using the WordNet semantic decoding method above; para 360; 

Regarding Claim 6, in addition to the elements stated above regarding claim 5, Ben further discloses:
wherein the determining of the relevance of the answer comprises determining universal relevance and/or user relevance of the answer (e.g. note the search for a reasonably close question/answer pair is based on a user query; para 306, it is done to “match” to the user’s question, it “corresponds to the user’s query; para 306, thus match corresponding to a user, in other words “relevant” to a user).

Regarding Claim 7, in addition to the elements stated above regarding claim 1, Ben further discloses:
acquiring context information of a situation in which the voice recognition apparatus operates (e.g. various options gleaned from the user’s status within a particular application are received… options are based on the context experienced by the user during an interactive processes; para 233),
wherein the generating the answer based on the analyzed user input and the visual output comprises generating the answer based on the context information (e.g. context-specific grammars and dictionaries are used to analyzer user questions lexically; para 129).

Claim 8, in addition to the elements stated above regarding claim 7, Ben further discloses:
wherein the analyzing the user input based on the content information from the voice recognized by performing the voice recognition comprises analyzing the user input based on the context information (e.g. helping the natural language processing using a lexical dictionary; para 313).

Regarding Claim 9, in addition to the elements stated above regarding claim 7, Ben further discloses:
wherein the outputting the answer comprises determining an output form of the answer based on the content information and/or the context information (e.g. context-specific grammars and dictionaries are used… so that a unique and responsive answer can be provided to a user; para 129).

Regarding Claim 10, in addition to the elements stated above regarding claim 7, Ben further discloses:
wherein the context information comprises at least one piece of information of: a location of the voice recognition apparatus (e.g. note process of obtaining an answer to the elements in Fig. 18A; para 310, which includes country specific information, sales tax, tracking your package; see elements 1890, 1880 and 1850; see also shipping information as well in 1950 and pra 304), a motion (e.g. how the output character moves; para 199), a peripheral environment (e.g. characteristics of the agent may be configured at the client side, note visual props such as textbook; para 204), whether the 

Claim 11 is rejected under the same grounds as claim 1 stated above.

Claim 12 is rejected under the same grounds as claim 2 stated above.

Claim 13 is rejected under the same grounds as claim 3 stated above.

Claim 14 is rejected under the same grounds as claim 4 stated above.

Claim 15 is rejected under the same grounds as claim 5 stated above.

Claim 16 is rejected under the same grounds as claim 6 stated above.

Claim 17 is rejected under the same grounds as claim 7 stated above.

Claim 18 is rejected under the same grounds as claim 8 stated above.

Claim 19 is rejected under the same grounds as claim 9 stated above.

Claim 20 is rejected under the same grounds as claim 1 stated above.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Andrew C Flanders whose telephone number is (571)272-7516.  The examiner can normally be reached on M-F 8:30-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivian Chin can be reached on (571) 272-7848.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access 






/ANDREW C FLANDERS/           Primary Examiner, Art Unit 2654