Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 5/9/2019 and 1/20/2021 are being considered by the examiner.
Response to Arguments and Amendments

Rejections under 35 U.S.C 101
Applicant’s amendments/arguments filed on 5/17/2021 are being considered herewith.
Based on the amendments, the Examiner has withdrawn the 35 U.S.C. 101 rejections of claims 15-20 for being directed to a signal per se. 
With respect to claims 1, 2, 8, 9, 15 are 16, the Examiner maintains the 101 rejections because simply implementing mental steps on a processor is using the processor as a tool to carry out the mental process steps. Therefore the addition of “electronic devices” to the claim language is not significantly more than the abstract idea.  Accordingly, these additional elements “electronic” and “electronically” do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea.

Rejections under 35 U.S.C 102
With respect to claims 1, 8 and 15 in Rejections under 35 U.S.C 102 the arguments are moot as the Examiner is introducing a new reference and the claims are rejected based on the new reference
With respect to claims 2, 9 and 16 in Rejections under 35 U.S.C 102  the Examiner has not found the arguments convincing about “quantized time stamp” which are relative to “previously detected 
With respect to claims 4, 11 and 18 in Rejections under 35 U.S.C 102 the arguments presented are moot as the examiner is introducing a new reference and the claims are rejected based on the new reference.

Rejections under 35 U.S.C 103
With respect to claims 5, 12 and 19 in Rejections under 35 U.S.C 103 on page 12, the Examiner has not found the arguments convincing. While Dua mentions the time parameters as “contextual” parameters, the “time of day” as cited provides information on timestamps. Dua also clearly states “one or more intents” that are used in the models. The addition of “respectively associated with the detected phrases” does not add significantly to the claim. 
With respect to claims 7, 14 and 21 in Rejections under 35 U.S.C 103 on page 12, the Examiner has not found the arguments convincing. The citation states “For example, the system can select a current intent based on processing the annotated output of block 302 over one or more trained machine learning models to generate a confidence level for each of multiple intents, and can select the current intent based on it satisfying a threshold.” Thus the processing begins asynchronously when it is triggered by satisfying the threshold.
 	Applicant’s arguments for dependent claims 6, 13, and 20 are based on their dependency from claims 5, 12, and 19, respectively; thus, these rejections are maintained for reasons stated above.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 2, 8, 9, 15 are 16 are rejected under 35 U.S.C 101 because the claimed invention is directed to an abstract idea without significantly more.
The independent claims 1, 8 and 15 recite:  “a system/method/computer readable medium of detecting spoken intent, comprising: detecting a phrase in an electronic representation of an audio stream based on a pre- defined vocabulary; associating a time stamp with the detected phrase; and classifying a spoken intent based on a sequence of detected phrases and the respective associated time stamp.”
The limitations of “detecting”, “associating” and “classifying” as drafted covers a human organizing of activities where a human listens to another human speaking, detects phrases based on a pre-defined vocabulary, associates a time stamp with those phrases, and classifies a spoken intent based on the phrases and the associated time stamps.
This judicial exception is not integrated into a practical application. In particular claim 1 recites additional element of “processor coupled to the memory” and “logic coupled to processor” which is a form of generic computer equipment. In the as-filed Specifications “A machine-readable medium may include any medium and/or mechanism for storing or transmitting information in a form readable by a machine (e.g., a computing device). For example, a machine-readable medium may include read only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices;30 electrical, optical, acoustical or other forms of propagated signals”,  the elements “ROM”, “RAM”,  “disk storage”, “computing device” are all general purpose computer devices.

The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a computer is noted as a general computer. Mere instructions to apply an abstract idea using a generic computer component cannot provide an inventive concept. Further, the additional limitation in the claims noted above are directed towards insignificant solution activity. The claim is not patent eligible.
With respect to claims 2, 9 and 16 the claims relate to monitoring a continuous audio stream, detecting the phrases, and computing a time stamp. This amounts to a human looking at phrases in a stream and then computing the timestamps. The additional elements “circuitry” and “electronically” do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. No additional limitations are present.



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1- 3, 8-10, 15-17   are rejected under 35 U.S.C. 103 as being unpatentable over by Hardie (US 10482904) in view of Lee (US 20180075847 A1)

With respect to claims 1, 8 and 15 Hardie teaches a system/method/computer readable medium/electronic memory (C29 3rd para “As described herein, memory 210 and/or 404 may include volatile and nonvolatile memory, removable and non-removable media implemented in any method or technology for storage of information, such as computer-readable instructions, data structures, program component, or other data. Such memory 210 and/or 404 includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, RAID storage systems, or any other medium which can be used to store the desired information and which can be accessed by a computing device. The memory 210 and/or 404 may be implemented as computer-readable storage media (“CRSM”), which may be any available physical media accessible by the processor(s) 204 and/or 400 to execute instructions stored on the memory 210 and/or 404. In one basic implementation, CRSM may include random access memory (“RAM”) and Flash memory. In other implementations, CRSM may include, but is not limited to, read-only memory (“ROM”), electrically erasable programmable read-only memory (“EEPROM”), or any other tangible medium which can be used to store the desired information and which can be accessed by the processor(s)” to:
electronically detect a phrase in the audio stream based on a pre-defined vocabulary (C18, para 1 “The wakeword may produce a wakeword confidence level”),
electronically associate a time stamp (C18, para1 “The wakeword detection 308 may also produce a timestamp indicating the time at which the wakeword was detected.”) with the detected phrase,
 and electronically classify  a spoken intent (C4 last para, “In various examples, the weighted confidence scores may not be higher than a threshold confidence score after performing ASR, and a third stage of analysis must be performed. In such examples, the remote speech processing service may perform natural language understanding (NLU) on the textual data determined using ASR on the audio signals to determine an intent expressed by the user in the speech utterance”)  [[based on a sequence of detected phrases and the respective associated time stamps.]]
Hardie does not teach based on a sequence of detected phrases and the respective associated time stamps
Lee teaches based on a sequence of detected phrases and the respective associated time stamps ([0062] At the next phase of timestamp 2, the user provides an input utterance “I want to go to thai” 630. The web-based conversational agent 140 may then determine two possible tasks: a local restaurant task with a probability 0.6 and a travel task with a probability 0.4, because there is ambiguity about the intent of the user. The user may want to go to the country Thailand or may want to go to a restaurant of Thailand food. The local restaurant task has a constraint (food, thai, 0.8) and three results (restaurant 1), (restaurant 2), (restaurant 3) in the database that are matching the constraint. The travel task has a constraint (to, thai, 0.7) and no result matching this constraint in the database. The constraints may be determined based on parsing of the input utterance 630 and some features listed in FIG. 5, as well as the previous tasks in the task lineage, i.e. the task information obtained at timestamp 0 and timestamp 1. For example, based on the previous transit tasks “from edgewater to new york” at timestamp 0 and “from leonia to new york” at timestamp 1, the web-based conversational agent 140 may estimate with a low probability 0.4 that the user's intent at timestamp 2 is a travel task “to Thailand”. This probability may be even lower when timestamp 1 and timestamp 2 are very close in time, because it is unlikely for the user to change mind about a transit task so fast.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Hardie to include the teachings of Lee motivation being that timestamps keep track of the task lineages (Lee, [0064]).

With respect to claims 2, 9, and 16, Hardie teaches wherein the logic circuitry is further to: electronically monitor a continuous audio stream (C9 last para, “In some examples, the speech interface devices 108A and 108B may continuously collect or monitor, using various sensors, the environment 102 and the device states, to collect and determine the metadata 116. In other examples, responsive to a wakeword, the speech interface devices 108A and 108B may use the various sensors to collect and determine the metadata while streaming the audio signals 114A and 114B to the remote speech processing service 110.“); 
electronically detect the phrase in the continuous audio stream (C18, para 1 “The wakeword may produce a wakeword confidence level”); 
and electronically compute a quantized time stamp for the detected phrase which is relative to previously detected phrase (C18 2nd para “The wakeword detection 308 may also produce a timestamp indicating the time at which the wakeword was detected.”).

With respect to claims 3, 10 and 17 Hardie teaches wherein the logic circuitry comprises: a first neural network with an acoustic model (C20 second to last para: “The device or devices performing the ASR processing may include an acoustic front end (AFE) 424 and a speech recognition engine 426. … the AFE 424 determines a number of values, called features … A number of approaches may be used by the AFE 424 to process the audio data, such … neural network feature vector techniques…”) and a hidden Markov model to detect the phrase in the audio stream (C17 4th para: “For example, wakeword detection may use a Hidden Markov Model (HMM) recognizer that performs acoustic modeling of an audio signals and compares the HMM model to one or more reference HMM models that have been created by training for a specific trigger expression.”)

Claims 4, 11 and 18   are rejected under 35 U.S.C. 103 as being unpatentable over by Hardie, Lee in further view of Chen (US 20140146644 A1)
With respect to claims 4, 11 and 18 Hardie and Lee do not teach wherein the acoustic model is further configured to:  automatically add time stamp information to text data for the detected phrase.
Chen teaches wherein the acoustic model is further configured to:  automatically add time stamp information to text data for the detected phrase [0054] Computing device 206, such as a server, can be a remote computer connected to network 204 for communicating with other system components, such as receiving user commands, transmitting activation signals, and transmitting device instructions. FIG. 3 provides further detail regarding the functionality of computing device 206. A speech recognition module 301 can accept audio input and can output annotated text that has been recognized. The speech recognition module 301 can be constructed as one or more pieces of automated speech recognition (ASR) software 302, configured with an acoustic model 303 containing analysis data, parameters and characterizations that determine various phonemes or other audio elementary blocks, and also a language model 304 containing a dictionary of distinguishable phoneme combinations to form words and/or phrases, and optionally a grammar or statistical model that stipulates the acceptance probability for various combinations of them. These annotations may comprise one or more of, but are not limited to: amplitude and/or phase; pitch/frequency center and/or range (or a parametric representation of the distribution across multiple); timestamp and/or time zone information; physical location and/or network address where applicable; sensor device; associated service account; associated user; current service authentication/entitlement state where applicable. A natural language processing (NLP) engine 305 can accept text, optionally with these annotations, and can output a data representation of recognized action commands for subsequent dispatch and execution.
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Hardie and Lee to include the teachings of Chen motivation being to provide ambient voice control in multi-device scenarios without undue disruption to the battery life. (Chen, [0037]).



Claims 5, 7, 12, 14, 19 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Hardie and Lee, in view of Dua (US20200143114).
With respect to claims 5, 12 and 19, Hardie does not teach wherein the logic circuitry comprises: a second neural network trained to return a probability for each of two or more intent classifications respectively associated with the detected phrases as input features to the second neural network.
Dua teaches wherein the logic circuitry comprises: a second neural network trained to return a probability for each of two or more intent classifications based on detected phrases and time stamps respectively associated with the detected phrases  as input features to the second neural network ([0052] “In many implementations, the intent models 154 can include machine learning models, such as deep neural network models. In some of those implementations, each of the machine learning models can be trained to predict a probability that each of one or more intents is currently present in the communication session. A prediction can be generated based on a machine learning model by processing, using trained parameters of the machine learning model, of one or more inputs for the machine learning model, such as: received inputs in the communication session, annotations of those inputs, parameter(s) of an agent that is involved in the communications session, contextual parameters (e.g., location, time of day, day of week), etc.”)  
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Hardie to include the teachings of Dua, motivation being to optimize performance by minimizing dialog turns when intent is based on the highest probability of occurring (Dua, [0003]).
With respect to claims 7, 14 and 21 Hardie does not teach wherein the logic circuitry is further to: asynchronously trigger the second neural network when a sequence of detected phrases is ready for classification.  
Dua teaches wherein the logic circuitry is further to: asynchronously trigger the second neural network when a sequence of detected phrases is ready for classification (Figure 3A shows the flowchart for processing, and [0067] : “At block 302, the system processes the input(s) to generate annotated output”, and [0069] “For example, the system can select a current intent based on processing the annotated output of block 302 over one or more trained machine learning models to generate a confidence level for each of multiple intents, and can select the current intent based on it satisfying a threshold.”)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Hardie to include the teachings of Dua, motivation being to optimize performance by minimizing dialog turns when intent is based on the highest probability of occurring (Dua, [0003]).
Claims 6, 13 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Hardie in view of Dua and further in view of McGann (US 20180338041)
With respect to claims 6, 13 and 20 Hardie in view of Dua does not teach wherein the logic circuitry is further to: classify the spoken intent in accordance with a highest probability of the two or more intent classifications.  
McGann teaches wherein the logic circuitry is further to: classify the spoken intent in accordance with a highest probability of the two or more intent classifications ([0015] “According to one embodiment of the invention, the selected intent has a highest probability of the computed probabilities”).  
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Hardie in view of  Dua to include the teachings of McGann  motivation being to increase flexibility of automated speech-enabled systems to allow users to traverse a finite number or execution strategies (McGann, [0003-0004]).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ATHAR N PASHA whose telephone number is (408)918-7675.  The examiner can normally be reached on Monday-Thursday Alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/A.N.P./Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657