DETAILED ACTION
This action is responsive to the Request for Continuation filed on 6 September 2022. Claims 1-3, 5-11, 13-18, and 20 are pending in the case. Claims 1, 9, and 16 are independent claims.
This action is non-final.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. 4. 
Applicant's submission filed on 6 September 2022 has been entered.
Applicant’s Response
In Applicant’s response dated 6 September 2022 (hereinafter Response), Applicant amended Claims 1, 9, and 16; and argued against the objections and/or rejections previously set forth in the Office Action dated 3 June 2022 (hereinafter Previous Action).
Response to Amendment/Arguments
Applicant’s amendment to claims 1, 9, and 16 to further clarify the metes and bounds of the invention are acknowledged. In particular, Applicant has amended the independent claims to require aggregating the modified user input with a plurality of historical modified user inputs based on a dynamic threshold and wherein the prior inputs are not a portion of the received user input, and are not a portion of a current interaction with the virtual assistant. 
During the after-final interview held on 16 August 2022 (see PTO-413 mailed 22 August 2022), Examiner noted that support for the interpretation of are not a portion of a current interaction with the virtual assistant as argued by Applicant’s representative (e.g. across multiple interactive sessions, potentially from more than one user) was not found in the disclosure as originally filed. At best, “current interaction with the virtual assistant”, in view of the disclosure as filed, may be interpreted as the most recent speech input from the user which is presently being processed and is distinguished from an earlier speech input (e.g. as shown in FIG 3 (305) receive input, after processing and determining input is ongoing (360) looping back to (305) to receive the next input). This is the mechanism that is taught in the rejection of record, that is ROY repeatedly receives and stores input from the user until a complete command has been received as made clear in the previous action and as reiterated below.
Any argument regarding any other interpretation of the limitation must be provided with citations of support from the disclosure as originally filed which shows explicit support for aggregating modified inputs from multiple users and/or multiple sessions prior to creating an enhanced API call.
During the after-final interview, Examiner agreed that ROY and ACERO were silent with respect to the term dynamic threshold. No agreement was reached regarding the patentability of any claim.
Applicant’s prior art arguments with respect to the pending claims have been fully considered but are moot in view of the new grounds of rejection presented below, which are required in response to the Applicants’ amendments.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-3, 5-11, 13-18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over ROY et al. (Patent No.: US 8,219,407 B1, previously cited) in view of ACERO et al. (Pub. No.: US 2004/0148170 A1, previously cited) further in view of ROZYCKI et al. (Patent No.: US 10,923,122 B1, filed 12/03/2018, newly cited).
Regarding claim 1, ROY teaches the method for enhancing virtual assistant interactions (intended use), the method comprising (relying primarily on FIGs 5 and 6 (which are further adaptations of FIGs 1, 2, and 4 (see col 23 line 39); broadly (col 2 lines 12-25, emphasis added) determining that a recognized command by a speech recognizer requires additional processing; storing a representation of the output of the speech recognizer in a command structure; iteratively determining if the command is sufficiently complete and ready for processing, and if so executing the command in a respective application or process and exiting said iteratively determining step; if the command is insufficiently complete or not ready for processing, prompting a user for further input; receiving, processing and storing in the command structure prompted user command-related input; and determining an abort condition, and if the abort condition exists, exiting the iterative determining, else continuing said iteratively determining step):
receiving a user input (an initial input is received at FIG 5 (S105); note that additional user input is received at FIG 6 (S664) and (S213) when command is incomplete; note (col 16 lines 10-15) Complex command input is the input of multiple commands and data in a single string of speech input. Examples of complex command input are two commands and a data element in a single string of speech input. Other examples are multiple commands only and a single command and a data element together; interpreting the received user input as the last input received in order to complete the command, where previous inputs were received including the initial input and some other input which was not sufficient to complete the user’s command);
classifying, the user input (interpreting “classifying” as determining how the user input should be interpreted; assuming the most recent user input is not a request for termination (S106) process with speech recognizer (S107) to determine whether speech is a command (S109), or input data for an application or process (S119); further interpreting “command” as an “intent” of the user, e.g. what the user wishes the system to accomplish using the voice system; additionally, once system has determined it is trying to collect information in order to execute a command; generally, input is processed with speech recognizer in at least FIG 5 (S107), FIG 6 (S665) and (S214); as noted above, the input could include one or more commands which need to be individually processed; input could be the response (data) for a prompt; input could be termination request; and so on);
determining, based on the classification, a set of contextual information for the user input (after resolving any ambiguity with the command (S113, S114, S115) and when the command requires additional processing (S121) store representation of input so far (S123); perform any additional processing needed (S124); starting at (S600) in FIG 6; this includes obtaining further input from the user (S662, S664); processing the input (S667); repeating until complete command or abort; as noted above there may be multiple command which need to be individually processed; thus “determining” what information is needed in order to complete the command);
modifying the user input, based on the classification and the set of contextual information (when ambiguity needs to be resolved, do so at (S115); when additional processing is required, in FIG 6, after receiving any additional information from the user, a second interpretation includes extracting, when there are multiple commands within the first input; a first command to be processed then a second command to be processed); 
aggregating (collecting together) the modified user input with a plurality of historical modified user inputs based on a  threshold (per instant application [0030, 0043-0045] perform a threshold check and if the threshold check passes (is met), aggregate the inputs in order to generate an enhanced API call, otherwise keep processing inputs; ROY teaches determine (S201a) whether memory stores a complete command by the gathering and aggregating (collecting) all necessary information (S206) to execute command (S207); interpreting “threshold” in this case as “the command is complete”), wherein the historical modified user inputs are user inputs from prior inputs, wherein the prior inputs are not a portion of the received user input and are not a portion of a current interaction with the virtual assistant (as noted above, user input is collected (aggregated) until sufficient information has been received to process a command of the initial input; for example a first command may need two different pieces of information which the user provides in two separate subsequent inputs. The first command will be aggregated with the first subsequent input (after it has been processed) and then a test is performed to see if additional input is needed; thus interpreting the “received user input” as the last input in a sequence of inputs, sufficient to complete the command, while a “plurality of historical modified user inputs” are those inputs in FIGs 5 and 6 which have been received and processed, but which have not yet resulted in a complete command; interpreting “current interaction” as the most recent speech input provided by the user which is contrasted with any previously-received, processed, and stored input, particularly as ROY determines which “virtual assistant” will be appropriate to perform the user’s command (S207) execute command in appropriate application process, where the user’s input may include complex commands (col 5 line 15) which may be processed by multiple applications (col 5 line 22) );
generating an aggregate modified user input, based on the classification, the set of contextual information, and the aggregated plurality of historical modified user inputs (as noted above, user input is collected (aggregated) until sufficient information has been received to process the command; FIG 6: collecting all necessary information (S206) to execute command by appropriate application (S207)); and
passing the aggregated modified user input to the virtual assistant (once all ambiguity has been resolved; when the command does not need additional processing (S121), execute the command (S116); when the command does need additional processing, after collecting all needed information (S206), execute command (S207) executing a command once all necessary input has been received. Interpreting “virtual assistant” as the process which is executing the complete command).
Note one of the problems that ROY is specifically addressing (col 7 lines 35-43, emphasis added) Current state of the art speech recognition systems analyze speech input for a command first, and if a command is found the input is processed based on that command. Such systems are limited to the input of one command in a single speech input, and it is not possible to have multiple commands and data (such as dictation) present in the same input stream. (col 7 line 47, emphasis added) If one or more multiple commands require additional processing by the logical command processor, then each command and data element are processed accordingly by the logical command processor.
Note example use cases include (a) (col 8 lines 1-20) controlling various device functions such as operating system, automobile and control systems, embedded in VCR or DVD recorder; or other devices; and (b) (col 11 line 60 to col 12 line 17); (col 15 line 1 to col 16 line 8) making a calendar appointment in a calendar program by collecting the necessary data elements before creating the appointment. 
While ROY clearly analyzes the user input, regardless of when it is provided, in order to determine what the user intends with the input (e.g. to terminate/abort, to execute a command, to provide additional contextual information for a command, to resolve any ambiguity), and ROY describes using several different speech recognizers, as needed (see e.g. (col 25 lines 40-57)) in order to increase the overall accuracy of the speech recognition process, including different biasing algorithms, HMM ((col 18 lines 15-35) statistically-biased model), context-free-grammars, etc.; ROY does not expressly disclose the processing of speech input using a speech recognizer comprises using a neural network. Further, while ROY does analyze the inputs received so far to determine whether sufficient information has been received to execute the command (an example of threshold check as described in the instant application), ROY does not describe this as a dynamic threshold (e.g. personalized to the user).
ACERO is similarly directed to using statistical classifiers in order to perform task identification on natural language inputs, e.g. in order ascertain if an input is a search query or a natural-language input (see abstract).  ACERO describes using a number of different statistical classifiers 204 and a selector 221 (see e.g. FIG 4) to determine the task (or classification) identification which will be passed to an application for execution (see broadly [0047]). Of particular note, ACERO states that [0085] the selector 221 which ultimately selects the task or class ID could be other components as well, such as a neural network or a component other than the voting component 222; where the selector is [0086-0087] trained using training data and (optionally) with confidence measures and biasing [0122]. Thus, ACERO clearly teaches classifying, using a neural network, the user input in order to determine what the user input was so that it may be properly executed.
As ACERO teaches it was known to use a neural network for classifying natural language input as part of a statistical classifier; and ROY uses statistical classification for determining how to interpret natural language input, it would have been obvious for one having ordinary skill in the art at the time the invention was effectively filed to have used a neural network (such as is taught in ACERO) in order to classify the user input (as is required in ROY) with a reasonable expectation of success, merely by substituting (or using) the known neural network implementation of a statistical classifier in the “speech recognizer” in ROY which is used to determine the user’s intent (e.g. command, target, any additional data input, abort request) of the speech input.
ROZYCKI is similarly directed to (abstract) processing user speech. Note (col 5 lines 31-45) which makes clear remote system 120 in FIG 1 which is performing at least part of the operations for speech analysis is configured to provide particular functionality to large numbers of local (e.g., in-home, in-car, etc.) devices of different users.
See e.g. method 300 in FIG 3 for the operations which receives the user speech, processes the speech, buffers (for later use) audio samples (phrases), performs a criterion (threshold) check, choosing to keep the audio samples in the buffer, generating text data, processing with NLU and determine whether the interaction is complete so that it may be executed.
Of particular interest is the use of NLU for identifying the “slots” which are needed for command processing so that the system may generate a particular response (see e.g. (col 11 lines 32-49)) where these slots (and the terms to complete them) are based on different grammars which can be personalized to each user and/or device (thus dynamic). A specific example is described (col 11 line 61 to col 12 line 8) which explains how the NLU component attempts to match words and phrases in the personalized lexicon in order to determine local intent data. Based on the results of this matching (e.g. ReadyToExecute or failure), a hybrid request selector will determine whether to provide a local result or use data received from the remote system (col 12 lines 9-30).
Restated, the determination of whether the system has all necessary intent data is based on matching dynamic (personalized) data sufficient to determine the intent data is ready to be executed or is not ready to be executed (a failure to find all data). 
Thus, when applying the determination of whether the command is complete or not based on dynamic (personalized) data as taught in ROZYCKI to the “is the command complete” threshold taught in ROY in view of ACERO, one having ordinary skill in the art of natural language processing can immediately see how ROY in view of ACERO can be further improved by having dynamic thresholds (comparing the grammar-based slots to be filled for commands with the personalized information in order to determine whether the intent data is complete, thus the command is ready to execute or not) with a reasonable expectation of success, the combination motivated by the desire in ROZYCKI to (col 11 line 60) to make slot resolution more flexible, as well as the overall improvement of ROZYCKI to (col 16 line 64) avoid processing audio data that has no impact on the result of the speech recognition.
Regarding dependent claim 2, incorporating the rejection of claim 1, ROY in view of ACERO, further in view of ROZYCKI, combined at least for the reasons discussed above, further teaches wherein the neural network is a classifier neural network (as taught in ACERO; discussed in rejection of claim 1), and wherein the user input is classified as a question, a statement, or an answer (e.g. a query, a command to be executed, additional information provided in response to a system request; request to abort).
Regarding dependent claim 3, incorporating the rejection of claim 1, ROY further teaches wherein the set of contextual information includes an intent (e.g. the command), an entity (e.g. target of command), and a context (e.g. any other required command elements) (see e.g. ROY (col 31 line 20) At S201a the system determines if the representation of the speech input contains a command or data input, the context of the input (for example command or dictation and the target for the command or data input), and if it contains a command, the completeness, including required command elements and which elements are present and missing; see for example the use case (col 11 line 60 to col 12 line 17); (col 15 line 1 to col 16 line 8) for making a calendar appointment)
Regarding dependent claim 5, incorporating the rejection of claim 1, ROY further teaches the method further comprising:
receiving, from the virtual assistant, a response to the {modified} user input (FIG 5 (S117) command executed successfully; see also FIG 6 (S208) command executed successfully); and
notifying the user of the response ( FIG 5 (S118) notify user; see also FIG 6 (S126) notify user).
Regarding dependent claim 6, incorporating the rejection of claim 5, ROY further teaches wherein modifying the user input is predicated upon the set of contextual information meeting a context threshold (when additional processing of input is required, FIG 6 (S201a) determining completeness of a command (S206) is command sufficiently complete for execution, does it include a command (intent), target (entity), and other necessary parameters; note this is analogous to “context threshold” as described in the instant application as originally filed [0030] (e.g., at least 1 intent, 1 entity, and 1 context) on a received input. ).
Regarding dependent claim 7, incorporating the rejection of claim 5, ROY in view of ACERO, further in view of ROZYCKI, combined for the reasons discussed above) further teaches wherein the classified user input is cached and aggregated with a plurality of classified user inputs, (interpreting “cached” as stored in a memory location; see ROY FIG 5 (S123) store representation of speech input to memory location; FIG 6 (S216) parse representation of applicable speech input into memory location (S216) and (S667)) and wherein passing the modified user input is predicated upon the modified user input meeting a context threshold (when additional processing of input is required, ROY FIG 6 (S201a) determining completeness of a command (S206) is command sufficiently complete for execution, does it include a command (intent), target (entity), and other necessary parameters; note this is analogous to “context threshold” as described in the instant application as originally filed [0030] (e.g., at least 1 intent, 1 entity, and 1 context) on a received input. ; note that the improvement taught in ROZYCKI as discussed in claim 1 allows for a more flexible (dynamic, personalized) determination of what comprises a complete command).
Regarding dependent claim 8, incorporating the rejection of claim 5, ROY in view of ACERO, further in view of ROZYCKI, combined at least for the reasons discussed above, further teaches wherein the neural network is trained prior to receiving the user input (ACERO [0086-0087] selector 221 is trained) and wherein the training includes adjusting a weight or a bias of the neural network (ACERO [0087] Selector 221 can receive the confidence measure both during training, and during run time, in order to improve the accuracy with which it identifies the task or class corresponding to feature vector 212; note also discussion of biasing based on training data [0122]).
Regarding claims 9-11, 13-15, ROY in view of ACERO, further in view of ROZYCKI, combined at least for the reasons discussed above, similarly teaches the computer program product for enhancing virtual assistant interactions, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a device (ROY (col 2 line 49) environment having speech recognizer software process and logical command processor software process; (col 5 line 20) System functions as an interface enabling humans to exercise command and control over computer software applications and to input complex commands and information into multiple software applications by speech; inherently software executed by a computer in order to effect control must be stored in some memory; example use cases may be found in (col 8 lines 1-20) and include controlling various device functions such as operating system, automobile and control systems; embedded in VCR or DVD recorder; other devices; see use case (col 11 line 60 to col 12 line 17); (col 15 line 1 to col 16 line 8) for making a calendar appointment) to cause the device to: execute the methods of claims 1-3, 5-7; thus rejected under similar rationale.
Regarding claims 16-18, 20, ROY in view of ACERO, further in view of ROZYCKI, combined at least for the reasons discussed above, similarly teaches a system (e.g. a computer; embedded control system for a vehicle, embedded control system for other devices; see use cases (col 8 lines 1-20) for enhancing virtual assistant interactions, comprising: a memory with program instructions included thereon; and a processor in communication with the memory (inherent components of a computer or other embedded control system), wherein the program instructions cause the processor to: execute the methods of claims 1-3, 5; thus rejected under similar rationale, noting that claim 16 does not require “wherein the historical modified user inputs are user input from prior inputs” which is recited in claim 1.

It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way. “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned. They are part of the literature of the art, relevant for all they contain.” In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968)). Further, a reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill the art, including nonpreferred embodiments. Merck & Co. v. Biocraft Laboratories, 874 F.2d 804, 10 USPQ2d 1843 (Fed. Cir.), cert. denied, 493 U.S. 975 (1989). See also Upsher-Smith Labs. v. Pamlab, LLC, 412 F.3d 1319, 1323, 75 USPQ2d 1213, 1215 (Fed. Cir. 2005); Celeritas Technologies Ltd. v. Rockwell International Corp., 150 F.3d 1354, 1361, 47 USPQ2d 1516, 1522-23 (Fed. Cir. 1998).
CONCLUSION
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure and/or argued interpretation of claim elements. 
20190114321 (LAM) generation and control of conversations
20150340033 (DI FABBRIZIO) context interpretation using previous dialog acts
20200301660 (KELLY) maintaining context for voice processes
20180330723 (ACERO) low-latency digital assistant with dialog flow control and task-flow processing (see FIGs 8, 11, 12)
20160259656 (SUMNER) virtual assistant continuity across device sessions

Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMY M LEVY whose telephone number is (571)270-3771. The examiner can normally be reached Mon-Fri 8am-4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KIEU VU can be reached on (571) 272-4057. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Amy M Levy/Primary Examiner, Art Unit 2173