Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 2 includes the limitation “…the intent probability is especially high when.” However, this limitation is unclear with regards to “especially high.” The word “especially” appears to apply a change in quantity to the word “high,” however “especially” provides no measurable amount or degree to which the word “high” is changed.  Further, the word “high” is comparative and necessarily only provides meaning in light of a comparison to something else. However, applicant provides no such comparison, thus making it ambiguous and unclear when a probability may be considered high or not. Therefore, the limitation “…the intent probability is especially high when…” is unclear and claim 2 is rejected.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5, 11 are rejected under 35 U.S.C. 103 as being unpatentable over Y. -P. Chen, R. Price and S. Bangalore, "Spoken Language Understanding without Speech Recognition," 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 6189-6193, doi: 10.1109/ICASSP.2018.8461718. (Year: 2018)20190325864 A1),  in further view of Moreno (US-20210074295-A1).

With respect to claim 1  Chen teaches A method comprising: obtaining audio (Chen: p3 Col1 last ¶ Log-spectrum features are extracted from the 8kHz speech signal.); 
, without transcription, from the audio, a plurality of intent probabilities (Chen: p1 Col 2 ll 4-6: In this work, we present a novel end-to-end approach to extract semantics directly from the speech signal without the need for a speech recognition system, Chen: p3  top, caption: Fig. 1. Diagram of the proposed audio-to-intent architecture for semantic classification, p2, Sec 3.1 para 3: With the parameters of the acoustic model component fixed, a deep text classifier is trained with a set of intent labeled data to predict posterior probabilities over the set of intents…);
Chen does not explicitly disclose but Moreno teaches  in response to an intent probability exceeding an intent threshold, invoking a virtual assistant action (Moreno: ¶ [0034] In some implementations, the candidate speech recognition generated by that model that has the highest confidence score may be processed by automated assistant 104 in order to determine which responsive action to perform.)., 
wherein the virtual assistant action is conditional based on which intent probability is the highest (Moreno: ¶ [0034] In some implementations, the candidate speech recognition generated by that model that has the highest confidence score may be processed by automated assistant 104 in order to determine which responsive action to perform.)

 It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Chen in view of Moreno, in response to an intent probability exceeding an intent threshold, invoking a virtual assistant action to be responsive to natural language inputs from the user ([0032], Moreno).

With respect to claim 5 Chen teaches inferring, without transcription, from the audio, a plurality of variable value probabilities (Chen: p1 Col 2 ll 4-6: In this work, we present a novel end-to-end approach to extract semantics directly from the speech signal without the need for a speech recognition system, Chen: p3  top, caption: Fig. 1. Diagram of the proposed audio-to-intent architecture for semantic classification, p2, Sec 3.1 para 3: With the parameters of the acoustic model component fixed, a deep text classifier is trained with a set of intent labeled data to predict posterior probabilities over the set of intents…), 
Chen does not explicitly disclose, but Moreno teaches wherein the virtual assistant action comprises an argument indicating which variable value probability is the highest (Moreno: ¶ [0034] In some implementations, the candidate speech recognition generated by that model that has the highest confidence score may be processed by automated assistant 104 in order to determine which responsive action to perform.)
 It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Chen in view of Moreno, in response to an intent probability exceeding an intent threshold, invoking a virtual assistant action to be responsive to natural language inputs from the use ([0032], Moreno).

Claim 11 is rejected based on the same combination of references as in claim 1 (Chen, Moreno). 
Chen does not explicitly disclose, but Moreno further teaches a device ([0051] FIG. 5 is a block diagram of an example computer system 510. Computer system 510 typically includes at least one processor 514 which communicates with a number of peripheral devices via bus subsystem 512. These peripheral devices may include a storage subsystem 524, including, for example, a memory 525 and a file storage subsystem 526, user interface output devices 520, user interface input devices 522, and a network interface subsystem 516. The input and output devices allow user interaction with computer system 510. Network interface subsystem 516 provides an interface to outside networks and is coupled to corresponding interface devices in other computer systems.)
 It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Chen in view of Moreno, in response to an intent probability exceeding an intent threshold, invoking a virtual assistant action to be responsive to natural language inputs from the use ([0032], Moreno).

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Chen and Moreno as applied to claim 1 in further view of Huang (US-20170083586-A1).
With respect to claim 2, Chen and Moreno do not explicitly disclose but Huang teaches wherein the intent probability is especially high (¶[0077] In one embodiment, the intent matcher 606 may determine a likelihood score of intent match based on probabilistic methods and/or machine learning for each match) when the audio (Huang: [0116] Computing platform 1000 exchanges data representing inputs and outputs via input-and-output devices 1002, including, but not limited to, keyboards, mice, audio inputs (e.g., speech-to-text devices) includes speech of words with the same meaning in a plurality of natural languages (¶[0048] For example, a dictionary of terms [In BRI sense dictionary can contain words with the same meaning in multiple languages] may be used, in multiple languages, to determine whether text may be determined to have positive, negative, or neutral connotations [determines intent])

It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to substitute Chen’s audio-to-intent teaching and Moreno’s intent probability teaching with Huang’s intent probability maximization for words common between multiple languages, which is known in the art to take advantage of common words for maximizing intent in order to yield predictable results (see KSR v Teleflex).

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Chen and Moreno as applied to claim 1 in further view of Kelley (US-20200103963 -A1).
With respect to claim 3, Chen and Moreno do not explicitly disclose but Kelley teaches wherein invoking the virtual assistant action is conditional based on having not previously invoked an action within a specific amount of time (Kelley: ¶[0218] In FIG. 6A, user 620 starts speaking a portion 624A “Turn . . . ” of command 624A-624B “Turn on the table lamp” before the digital assistant is activated. In some embodiments, portion 624A of command 624A-624B is not processed, is cancelled, and/or is ignored by the digital assistant if the digital assistant is not subsequently activated (e.g., within a set (non-zero) duration of time) [no action was invoked a fixed duration before] , as further discussed below.) 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Chen, Moreno in view of Kelley, in order that invoking the virtual assistant action is conditional based on having not previously invoked an action within a specific amount of time to provide electronic devices with faster, more efficient methods and interfaces for controlling electronic devices ([0006], Kelley).

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Chen and Moreno as applied to claim 1 in further view of Gruber (US-20130275138 -A1).
With respect to claim 4, Chen and Moreno do not explicitly disclose but Gruber teaches wherein invoking the virtual assistant action is conditional based on end-of-utterance detection on the audio(Gruber: [0302] For example, in a listening mode, a user may say "Hey Assistant--find me a nearby gas station . . . ." In this case, the assistant 1002 is configured to detect the phrase "hey assistant" as a wake-up to signal the beginning of an utterance that is directed to the assistant 1002. The assistant 1002 then processes the received audio to determine what should be sent to a remote service for further processing. In this case, the pause following the word "station" is detected by the assistant 1002 as an end of the utterance. The phrase "find me a nearby gas station" is thus sent to the remote service for further analysis (e.g., intent deduction, natural language processing, etc.). The assistant then proceeds to execute one or more steps, such as those described with reference to FIG. 7, in order to satisfy the user's request.).  
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Chen, Moreno in view of Gruber, in order that invoking the virtual assistant action is conditional based on end-of-utterance detection on the audio to satisfy the user's request. ([0302], Gruber).

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Chen and Moreno as applied to claim 1 in further view of Tomar (US- 20180358005 -A1).
With respect to claim 6, Chen further teaches The method of claim 1 further comprising: , without transcription, from the audio, a variable value probability (Chen: p1 Col 2 ll 4-6: In this work, we present a novel end-to-end approach to extract semantics directly from the speech signal without the need for a speech recognition system, Chen:p3  top, caption: Fig. 1. Diagram of the proposed audio-to-intent architecture for semantic classification, p2, Sec 3.1 para 3: With the parameters of the acoustic model component fixed, a deep text classifier is trained with a set of intent labeled data to predict posterior probabilities over the set of intents…), 
Chen and Moreno do not explicitly disclose but Tomar teaches wherein the virtual assistant action is conditional based on the variable value probability exceeding a variable value threshold (Tomar:¶[0060] FIG. 4 provides an example flowchart for another decision fusion module 400 implementation. The decision fusion module 400 in this example receives the STI system outputs 401, which contains both the predicted action by the STI system 107 and a confidence score for the prediction. The STI system outputs 401 are processed using a contextual learning component 403 to improve the predictions, by taking into account any available contextual information. The confidence score of the improved outputs 404 is then compared to a threshold value in a comparator 405. The threshold may be a fixed pre-computed value or variable that can be determined at run-time and may adaptively change throughout system usage. If the predicted confidence in the outputs 404 is above the threshold, the decision fusion module 400 outputs the predicted intent or action 406 ).  

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Chen, Moreno in view of Tomar, in order for the virtual assistant action being conditional based on the variable value probability exceeding a variable value threshold to increase recognition performance. ([0026], Tomar).

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Chen, Moreno and Tomar as applied to claim 6 in further view of Kanaan (US 20190132451 A1).

With respect to claim 7 Chen, Moreno and Tomar do not explicitly disclose but Kanaan teaches wherein the virtual assistant action is further conditional based on the variable value probability exceeding the variable value threshold within a specific time period of the intent probability exceeding the intent threshold  (Kanaan: ¶[0099] The processor is configured to deflect the conversation from the human agent to the VA for the subsequent input if the respective confidence score of the at least one intent for the subsequent input is greater than or equal to the predefined threshold score and if it is determined that the response time of the VA is less than the response time of the human agent  [in BRI sense, the response time of the human qualifies for  the specific time period] in relation to the processing of the subsequent input).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Chen, Moreno and Tomar in view of Kanaan so that the virtual assistant action is further conditional based on the variable value probability exceeding the variable value threshold within a specific time period of the intent probability exceeding the intent threshold   to predict one or more intents ([0008], Kanaan).

Claims 8 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Chen and Moreno as applied to claim 1 and 1, respectively,  in further view of Everman (US 20190179890 A1).
With respect to claim 8  Chen further teaches  inferring, without transcription, from the audio (Chen: p1 Col 2 ll 4-6: In this work, we present a novel end-to-end approach to extract semantics directly from the speech signal without the need for a speech recognition system, Chen: p3  top, caption: Fig. 1. Diagram of the proposed audio-to-intent architecture for semantic classification, p2, Sec 3.1 para 3: With the parameters of the acoustic model component fixed, a deep text classifier is trained with a set of intent labeled data to predict posterior probabilities over the set of intents…))
Chen, Moreno do not explicitly disclose but Everman teaches without transcription]], from the audio, a domain probability (Everman: ¶ [0008] In some implementations, a text string corresponding to a speech input is analyzed in light of the ontology to determine a domain that the text string most likely implicates, [0011] The natural language processor also determines a confidence score representing how well or to what extent a user input matches a particular domain or actionable intent. ;) , 
wherein the virtual assistant action is conditional based on the domain probability exceeding a domain threshold (Everman, ¶ [0011] The natural language processor also determines a confidence score representing how well or to what extent a user input matches a particular domain or actionable intent. The confidence score can be used, for example, to help determine which of two candidate domains is most likely to accurately reflect or represent the intent of the input. The confidence score can also be used to determine whether any of the domains or actionable intents are sufficiently relevant to the user input to justify selection of that domain or actionable intent. For example, in some implementations, the natural language processor is configured so that if no candidate domain satisfies a predetermined confidence threshold, the digital assistant will not provide a response. [Note: The score for a recognizer's confidence that the particular item is in correct domain is compared against “a predetermined confidence threshold” where “if no candidate domain satisfies a predetermined confidence threshold, the digital assistant will not provide a response”;).  
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Chen, Moreno in view of Everman so that the wherein the virtual assistant action is conditional based on the domain probability exceeding a domain threshold to better identify what the user may have said ([0013], Everman).

With respect to claim 9 Chen further teaches inferring without transcription (Chen: p1 Col 2 ll 4-6: In this work, we present a novel end-to-end approach to extract semantics directly from the speech signal without the need for a speech recognition system, Chen: p3  top, caption: Fig. 1. Diagram of the proposed audio-to-intent architecture for semantic classification, p2, Sec 3.1 para 3: With the parameters of the acoustic model component fixed, a deep text classifier is trained with a set of intent labeled data to predict posterior probabilities over the set of intents…)
 
, without transcription, from the audio, a plurality of variable value probabilities (Chen: p1 Col 2 ll 4-6: In this work, we present a novel end-to-end approach to extract semantics directly from the speech signal without the need for a speech recognition system, Chen: p3  top, caption: Fig. 1. Diagram of the proposed audio-to-intent architecture for semantic classification, p2, Sec 3.1 para 3: With the parameters of the acoustic model component fixed, a deep text classifier is trained with a set of intent labeled data to predict posterior probabilities over the set of intents…and 
Chen does not explicitly recite but Moreno teaches the virtual assistant action comprising an argument indicating which variable value probability is the highest ((Moreno: ¶ [0034] In some implementations, the candidate speech recognition generated by that model that has the highest confidence score may be processed by automated assistant 104 in order to determine which responsive action to perform.)

 It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Chen in view of Moreno, in response to an intent probability exceeding an intent threshold, invoking a virtual assistant action to be responsive to natural language inputs from the use ([0032], Moreno).
Chen and Moreno do not explicitly disclose but Everman teaches inferring, [[without transcription]], from the audio, a domain probability (Everman: ¶ [0008] In some implementations, a text string corresponding to a speech input is analyzed in light of the ontology to determine a domain that the text string most likely implicates, [0011] The natural language processor also determines a confidence score representing how well or to what extent a user input matches a particular domain or actionable intent), 
wherein the virtual assistant action is conditional based on the domain probability exceeding a domain threshold (Everman, ¶ [0011] The natural language processor also determines a confidence score representing how well or to what extent a user input matches a particular domain or actionable intent. The confidence score can be used, for example, to help determine which of two candidate domains is most likely to accurately reflect or represent the intent of the input. The confidence score can also be used to determine whether any of the domains or actionable intents are sufficiently relevant to the user input to justify selection of that domain or actionable intent. For example, in some implementations, the natural language processor is configured so that if no candidate domain satisfies a predetermined confidence threshold, the digital assistant will not provide a response. [Note: The score for a recognizer's confidence that the particular item is in correct domain is compared against “a predetermined confidence threshold” where “if no candidate domain satisfies a predetermined confidence threshold, the digital assistant will not provide a response”;);
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Chen, Moreno  in view of Everman so that the wherein the virtual assistant action is conditional based on the domain probability exceeding a domain threshold to better identify what the user may have said ([0013], Everman).

REASONS FOR ALLOWANCE
Claim 10 is allowed over the prior art on record.  The following is an examiner’s statement of reasons for allowance: Claim 10 recites “inferring, without transcription, from the audio: (a) a domain probability; (b) a plurality of intent probabilities; and (c) a plurality of variable value probabilities; in response to: (A) the domain probability exceeding a domain threshold; (B) an intent probability exceeding an intent threshold when the audio includes speech of words in one of a plurality of recognized natural languages; (C) a variable value probability exceeding a variable value threshold within a certain period of time of the intent probability exceeding the intent threshold; (D) an end of an utterance detection signal; and (E) a specific amount of time having elapsed, invoking a virtual assistant action comprising an argument indicating which variable value probability is the highest.” The closest art that teaches these limitations come from the cited art Chen who teaches (Chen: p1 Col 2 ll 4-6: In this work, we present a novel end-to-end approach to extract semantics directly from the speech signal without the need for a speech recognition system, Chen: p3  top, caption: Fig. 1. Diagram of the proposed audio-to-intent architecture for semantic classification, p2, Sec 3.1 para 3: With the parameters of the acoustic model component fixed, a deep text classifier is trained with a set of intent labeled data to predict posterior probabilities over the set of intents…), and  Everman who teaches (Everman: ¶ [0008] In some implementations, a text string corresponding to a speech input is analyzed in light of the ontology to determine a domain that the text string most likely implicates, [0011] The natural language processor also determines a confidence score representing how well or to what extent a user input matches a particular domain or actionable intent. However neither Chen or Everman, nor any other cited references teach, nor any other cited references disclose the combination of the limitations a) b) c) in response to A) B) C) as cited in the claims result in invoking a virtual assistant action comprising an argument indicating which variable value probability is the highest. Therefore claim 10 is allowed. 
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”



Conclusion
  Any inquiry concerning this communication or earlier communications from the examiner should be directed to ATHAR N PASHA whose telephone number is (408)918-7675.  The examiner can normally be reached on Monday-Thursday Alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ATHAR N PASHA/Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657