nvNotice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103 is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
Priority
Acknowledgment is made of applicant's claim for domestic priority based US provisional application 61/983025 filed on 04/23/2014, and US non-provisional applications 14/681203, 15/156478, 16154875, 16540795 filed on 04/08/2015, 05/17/2016, 10/09/2018, and 08/14/2019. 
Claim Rejections - 35 USC § 101
35 U.S.C. §101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-7, 9, 11-17, and 19 are rejected on the ground of nonstatutory double patenting of the claims in U.S. Patent No. 11004441 B2 in view of Ramaswamy et al. (US 2001/0056344 A1). 
17245019				US 11004441 B2
1. A computer-implemented method when executed on data processing hardware causes the data processing hardware to perform operations comprising: 
receiving audio data corresponding to an utterance; 

processing, using a speech recognizer, the audio data to generate a transcription of the utterance by recognizing each term in the utterance; 



recording an amount of time after each term is recognized by the speech recognizer; 


identifying an endpoint after one of the terms recognized by the speech recognizer when the recorded amount of time after the term is recognized by the speech recognizer satisfies a first threshold before a subsequent term is recognized by the speech recognizer; 

based on the endpoint identified after the one of the terms recognized by the speech recognizer, determining whether the utterance is one of likely complete or likely incomplete based on the transcription of the utterance; and 





1. A computer-implemented method comprising: 



receiving, by one or more computing devices, audio data of an utterance; 

executing, by the one or more computing devices, automated speech recognition (ASR) software to generate a transcription of the utterance by recognizing each term in the utterance; 

recording, by the one or more computing devices, an amount of time between each pair of adjacent terms recognized by the ASR software; 

determining, by the one or more computing devices, a first likelihood that the utterance is a complete utterance based on the recorded amount of time between each pair of adjacent terms; 



determining, by the one or more computing devices, whether the first likelihood that the utterance is a complete utterance satisfies a threshold level; in response to determining that the first likelihood that the utterance is a complete utterance satisfies the threshold level and based on the transcription of the utterance, 

determining, by the one or more computing devices, a second likelihood that the utterance is a complete utterance; and based on the second likelihood that the utterance is a complete utterance, determining, by the one or more computing devices, whether to designate an endpoint of the utterance at an end of the audio data of the utterance.
11. A system comprising: 
data processing hardware; and memory hardware in communication with the data processing hardware and storing instructions that when executed on the data processing hardware causes the data processing hardware to perform operations comprising: 

receiving audio data corresponding to an utterance; 

processing, using a speech recognizer, the audio data to generate a transcription of the utterance by recognizing each term in the utterance; 



recording an amount of time after each term is recognized by the speech recognizer; 


identifying an endpoint after one of the terms recognized by the speech recognizer when the recorded amount of time after the term is recognized by the speech recognizer satisfies a first threshold before a subsequent term is recognized by the speech recognizer; 

based on the endpoint identified after the one of the terms recognized by the speech recognizer, determining whether the utterance is one of likely complete or likely incomplete based on the transcription of the utterance; and 







overriding the endpoint identified after the one of the terms recognized by the speech recognizer when the utterance is likely incomplete.
9. A system comprising: 
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: 


receiving, by one or more computing devices, audio data of an utterance; 

executing, by the one or more computing devices, automated speech recognition (ASR) software to generate a transcription of the utterance by recognizing each term in the utterance; 

recording, by the one or more computing devices, an amount of time between each pair of adjacent terms recognized by the ASR software; 

determining, by the one or more computing devices, a first likelihood that the utterance is a complete utterance based on the recorded amount of time between each pair of adjacent terms; 


determining, by the one or more computing devices, whether the first likelihood that the utterance is a complete utterance satisfies a threshold level; in response to determining that the first likelihood that the utterance is a complete utterance satisfies the threshold level and based on the transcription of the utterance, determining, by the one or more computing devices, a second likelihood that the utterance is a complete utterance; and 

based on the second likelihood that the utterance is a complete utterance, determining, by the one or more computing devices, whether to designate an endpoint of the utterance at an end of the audio data of the utterance.


The claims of US 10923128 B2 does not recite overriding the endpoint identified after the one of the terms recognized by the speech recognizer when the utterance is likely incomplete.
Ramaswamy teaches a system using a speech recognizer to process audio data to generate a transcription of utterance corresponding to the audio data by recognizing each term in the utterance (¶25), determining whether the utterance is one of likely complete or likely incomplete based on the transcription of the utterance and on endpoint identified after one of the terms recognized by the speech recognizer (¶27, given one or more recognized text 30 denoted as S, boundary identifier decides if S is a complete command with T = 1 if S is a complete command or T = 0 if S is otherwise; ¶51, decision maker 45 makes a final decision as to whether or not the given processed utterance is a complete command according to equation 3), and overriding the endpoint identified after the one of the terms recognized by the speech recognizer when the utterance is likely incomplete (¶52, S is declared as a complete command if and only if P(T=1 | S) > P (T=0 | S); compare ¶¶74-77 and ¶¶80-81, e.g., override decision in ¶76 showing “check new mail show me” as complete command T = 1 with final decision “check new mail” as a first command and “show me the first one” as a second command).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to override the endpoint identified after the one of the terms recognized by the speech recognizer when the utterance is likely incomplete in order to recognize the presence of multiple commands in the same utterance (Ramaswamy, ¶79).  
Limitations of claims 2 and 12 in the instant application correspond to the combination of limitations set forth in claims 2 and 10 of US 11004441 B2. 
Limitations of claims 3 and 13 in the instant application correspond to the combination of limitations set forth in claims 3 and 11 of US 11004441 B2. 
 Limitations of claims 4 and 14 in the instant application correspond to the combination of limitations set forth in claims 4 and 12 of US 11004441 B2. 
Limitations of claims 5 and 15 in the instant application correspond to the combination of limitations set forth in claims 5 and 13 of US 11004441 B2. 
Limitations of claims 6 and 16 in the instant application correspond to the combination of limitations set forth in claims 6 and 14 of US 11004441 B2. 
Limitations of claims 7 and 17 in the instant application correspond to the combination of limitations set forth in claims 8 and 16 of US 11004441 B2. 
Regarding Claims 9 and 19, the claims of US 11004441 B2 do not recite wherein the speech recognizer resides on a user device. 
Ramaswamy teaches a speech recognizer residing on a user device (¶6 and ¶25, a conversational language system implemented by a digital computer comprising an electronic mail application).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to implement the speech recognizer on a user device to provide a more user friendly interface which permits a user to speak more naturally and continually (Ramaswamy, ¶20).
Claims 8 and 18 are rejected on the ground of nonstatutory double patenting of the claims in U.S. Patent No. 11004441 B2 in view of Ramaswamy et al. (US 2001/0056344 A1) as applied to claims 1 and 11, in view of Park et al. (US 2007/0201639 A1).
The claims of US 11004441 B2 do not recite wherein maintaining the microphone that detected the utterance in the active state permits the speech recognizer to process additional audio data. 
Park discloses a network terminal / robot (¶22) that performs voice detection by opening a microphone until a voice detection end time according to a single trigger (¶25). When multiple triggers are generated, the robot performs voice detection several times by repeatedly opening the microphone for a predetermined time and closing the same (¶25). Specifically, based on determination to designate an endpoint of an utterance at the end of audio data of the utterance, the network terminal maintains the microphone that detected the utterance in an active state according to multi-triggers (¶52 and Figs. 6A-6B, repeatedly maintaining the microphone in active state according to durations A until endpoint detection time indicated by durations B) to permit a voice recognizer to process additional audio data (¶31, voice recognition unit 202 receives voice signal detected by voice detector 104 of robot 100 and recognizes the received voice).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to maintaining the microphone that detected the utterance in the active state and permits the speech recognizer to process additional audio data in order to optimize voice detection (Park, ¶9).
Claims 10 and 20 are rejected on the ground of nonstatutory double patenting of the claims in U.S. Patent No. 11004441 B2 in view of Ramaswamy et al. (US 2001/0056344 A1) as applied to claims 1 and 11, in view of Carter (US 9311932 B2).
The claims of US 11004441 B2 do not recite wherein the speech recognizer resides on a server. 
Carter teaches a speech recognition system for detecting pauses in speech stream and marking the speech stream with an endpoint (Abstract) wherein the speech recognition system resides on a server (Col 5, Rows 59-61).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to implement the speech recognizer on a server to perform adaptive pause detection in speech recognition (Carter, Col 5, Rows 60-61).
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. See In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the conflicting application or patent is shown to be commonly owned with this application. See 37 CFR 1.131(c). A registered attorney or agent of record may sign a terminal disclaimer.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/forms/. The filing date of the application will determine what form should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.   
Claim Rejections - 35 USC § 103
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 103 that form the basis for the rejections under this section made in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-5, 9, 11-15, and 19 are rejected under 35 USC 103(a) as being unpatentable over Ramaswamy et al. (US 2001/0056344 A1) in view of Dobler (US 2005/0038652 A1).
Regarding Claims 1 and 11, Ramaswamy discloses a system (Fig. 1) comprising: 
data processing hardware (¶25, digital computers having processor implemented in various forms of hardware and software combination); and 
memory hardware in communication with the data processing hardware and storing instructions that when executed on the data processing hardware causes the data processing hardware (¶25, digital computer having processor and memory implementing software) to perform operations comprising: 
receiving audio data corresponding to an utterance (¶25, user issues audio input 10 comprising spoken command to system 8); 
processing, using a speech recognizer, the audio data to generate a transcription of the utterance by recognizing each term in the utterance (¶25, speech recognizer 25 converts audio input 10 into recognized text 30);  
identifying an endpoint after one of the terms recognized by the speech recognizer (¶25 and ¶27, boundary identifier 40 takes recognized text 30 as input S and decides if S is a complete sentence denoted by T = 1 and setting T=0 otherwise); 
based on the endpoint identified after the one of the terms recognized by the speech recognizer, determining whether the utterance is one of likely complete or likely incomplete based on the transcription of the utterance (¶27, boundary identifier 40 evaluates the conditional probability P(T|S) for T= 0 and T = 1 and selects as the decision that T which maximizes P(T|1); see e.g., ¶¶54-57, complete commands are marked T=1 while incomplete commands are marked T = 0); and 
overriding the endpoint identified after the one of the terms recognized by the speech recognizer when the utterance is likely incomplete (¶51, decision maker 45 makes a final decision as to whether or not the given processed utterance is a complete command according to equation 3; ¶52, S is declared as a complete command if and only if P(T=1 | S) > P (T=0 | S); compare ¶¶74-77 and ¶¶80-81, e.g., override decision in ¶76 showing “check new mail show me” as complete command T = 1 with final decision “check new mail” as a first command and “show me the first one” as a second command).
Ramaswamy does not disclose recording an amount of time after each term is recognized by the speech recognizer and identifying the endpoint after one of the terms recognized by the speech recognizer when the recorded amount of time after the term is recognized by the speech recognizer satisfies a first threshold before a subsequent term is recognized by the speech recognizer.
Dobler teaches a speech recognition device (¶17 and Fig. 1) for the recognition of words and pauses from voice signals (¶18) by recording an amount of time after each term is recognized by the speech recognizer (¶18, converting voice signals into feature vectors and comparing said feature vectors with stored word references R1, R2, and R3 and pause reference Rp in a time-related segmentation of voice signal into voice ranges comprising words Wi = {W1, W2, …} and non-voice ranges Ti = {T1, T2, …}; ¶19, measuring time interval (WEi-WAi) or word beginning i and word end i) and identifying the endpoint after one of the terms recognized by the speech recognizer when the recorded amount of time after the term is recognized by the speech recognizer satisfies a first threshold before a subsequent term is recognized by the speech recognizer (Abstract, words Wi spoken in a row and pauses T1 are thereby combined as to be appertaining to a word group as soon as one of the pauses Ti exceeds a limit value TG; ¶10, dynamically adjusts TG before the user speaks the next word group to recognize subsequent terms).  
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to implement boundary identifier 40 to select T which maximizes P(T|S) and thereby designating / identifying the endpoint after one of the term is recognized based on recording an amount of time after each term is recognized and determine that the recorded amount of time satisfies a first threshold before a subsequent term is recognized in order to combine words spoken in a row to be appertaining to a word group (Dobler, Abstract).
Regarding Claims 2 and 12, Ramaswamy discloses wherein determining whether the utterance is one of likely complete or likely incomplete comprises: 
comparing the transcription of the utterance to a first collection of text samples identified as complete utterances (¶60-72, ¶74, and ¶76 in view of ¶46-47 and ¶51, input processor processes training data to produce feature functions comprising word features from i = 1 to i = n where n represent the quantity of words in a training utterance. Then, for every utterance in recognized text, input processor determines which feature functions are present in a given processed utterance and whether the given processed utterance is likely complete and therefore a likely command boundary. Complete command Feature Functions corresponding to T = 1 describe training data that match the utterance and do not include any additional terms and therefore describe quantities of text that match the utterance); 
comparing the transcription of the utterance to a second collection of text samples identified as incomplete utterances (¶60-72, ¶74, and ¶76 in view of ¶46-47 and ¶51, incomplete command Feature Functions corresponding to T = 0 describe training data that match the utterance and include additional terms and therefore describe quantities of text that match the utterance as well as additional text); and 
determining whether the utterance is one of likely complete or likely incomplete based on comparing the transcription of the utterance to the first collection of text samples identified as complete utterances and comparing the transcription of the utterance to the second collection of text samples identified as incomplete utterances (¶49-52, using feature functions corresponding to T = 0 and T = 1 to calculate P (T=0|S) and P (T=1|S) and determine if the utterance is a complete command or incomplete command based on P (T=0|S) > P (T=1|S); see ¶74 and ¶76, “check new mail” T = 1 and “check new mail show me” T = 1).
Regarding Claims 3 and 13, Ramaswamy discloses wherein comparing the transcription of the utterance to the first collection of text samples identified as complete utterances comprises: 
determining a first quantity of text samples in the first collection that match the transcription of the utterance (¶60-72, ¶74, and ¶76 in view of ¶46-47 and ¶51, input processor processes training data to produce feature functions comprising word features from i = 1 to i = n where n represent the quantity of words in a training utterance; complete command Feature Functions corresponding to T = 1 describe training data that match the utterance and do not include any additional terms and therefore describe quantities of text that match the utterance); and 
determining a second quantity of text samples in the second collection that match the transcription of the utterance (¶60-72, ¶74, and ¶76 in view of ¶46-47 and ¶51, incomplete command Feature Functions corresponding to T = 0 describe training data that match the utterance and include additional terms and therefore describe quantities of text that match the utterance as well as additional text).
Regarding Claims 4 and 14, Ramaswamy discloses wherein comparing the transcription of the utterance to the first collection of text samples identified as complete utterances comprises: 
determining whether terms in each text sample in the first collection occur in a same order as terms of the transcription of the utterance; and determining whether terms in each text sample in the second collection occur in a same order as terms of the transcription of the utterance (¶51 in view of ¶47-48, the feature functions describe the order in which the words are positioned relative to each other in the training data / utterance such that determining which feature functions are present in a given processed utterance means determining whether words in the processed utterance are in the same order as words in the training data / utterances).
Regarding Claims 5 and 15, Ramaswamy discloses wherein the operations further comprise, when the utterance is likely complete, determining to designate the endpoint identified after the one of the terms recognized by the speech recognizer at an end of the audio data of the utterance (¶52, utterance S is declared as a complete command if and only if P(T=1|S) > P(T=0|S); ¶79, place a command boundary after each portion of the utterance corresponding to a complete command; e.g., T=1 at the end of “Show me the first one”).
Regarding Claims 9 and 19, Ramaswamy discloses wherein the speech recognizer resides on a user device (¶6 and ¶25, a conversational language system implemented by a digital computer comprising an electronic mail application).
Claims 6 and 16 are rejected under 35 USC 103(a) as being unpatentable over Ramaswamy et al. (US 2001/0056344 A1) in view of Dobler (US 2005/0038652 A1) as applied to claims 1 and 11, in view of Hung et al. (US 2014/0012573 A1).
Ramaswamy does not disclose based on determining to designate the endpoint identified after the one of the terms recognized by the speech recognizer at the end of the audio data of the utterance, deactivating a microphone that detected the utterance.
Hung discloses a computing system comprising a microphone for receiving utterance (¶5 and ¶22, signal collection unit 102 or microphone), a speech recognition unit for obtaining a transcription of an utterance corresponding to received audio (¶22, speech recognition system 104), and a voice activity detection unit configured to designate an end of voice activity in response to determining that an energy level of received audio falls below a threshold energy level after a user has started speaking (¶11-12, ¶52, ¶58, and see Figs. 3-4, step 402, determine if audio energy of a current audio frame being smaller than threshold TH1; if yes, VAD = 0). 
Further, based on the transcription of the utterance, designating, by the computing device, an endpoint of the utterance comprising deactivating the microphone (¶24, signal collection unit 102 and speech recognition  system 104 will enter power saving mode when it is determined that there is no voice signal anymore).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to modify Ramaswamy to designate an endpoint of the utterance by deactivating the microphone as taught by Hung in order to save power efficiently (Hung, ¶24).
Claims 7-8 and 17-18 are rejected under 35 USC 103(a) as being unpatentable over Ramaswamy et al. (US 2001/0056344 A1) in view of Dobler (US 2005/0038652 A1) as applied to claims 1 and 11, in view of Park et al. (US 2007/0201639 A1).
Ramaswamy does not disclose wherein overriding the endpoint identified after the one of the terms recognized by the speech recognizer comprises maintaining a microphone that detected the utterance in an active state.
Park discloses a network terminal / robot (¶22) that performs voice detection by opening a microphone until a voice detection end time according to a single trigger (¶25). When multiple triggers are generated, the robot performs voice detection several times by repeatedly opening the microphone for a predetermined time and closing the same (¶25). Specifically, based on the multiple triggers, the network terminal maintains the microphone that detected the utterance in an active state according to multi-triggers (¶52 and Figs. 6A-6B, repeatedly maintaining the microphone in active state according to durations A until endpoint detection time indicated by durations B) to permit a voice recognizer to process additional audio data (¶31, voice recognition unit 202 receives voice signal detected by voice detector 104 of robot 100 and recognizes the received voice).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to, when overriding the endpoint identified after the one of the terms recognized by the speech recognizer, maintaining the microphone that detected the utterance in the active state and permits the speech recognizer to process additional audio data in order to optimize voice detection (Park, ¶9); i.e., to receive additional audio data corresponding to “the first one” in “show me the first one” (Ramaswamy, ¶76, override T=1 designation for “check new mail show me” to receive audio data corresponding to “check new mail show me the first one” in ¶81).
Claims 10 and 20 are rejected under 35 USC 103(a) as being unpatentable over Ramaswamy et al. (US 2001/0056344 A1) in view of Dobler (US 2005/0038652 A1) as applied to claims 1 and 11 in view of Carter (US 9311932 B2).
Ramaswamy does not disclose wherein the speech recognizer resides on a server. 
Carter teaches a speech recognition system for detecting pauses in speech stream and marking the speech stream with an endpoint (Abstract) wherein the speech recognition system resides on a server (Col 5, Rows 59-61).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to implement the speech recognizer on a server to perform adaptive pause detection in speech recognition (Carter, Col 5, Rows 60-61).
Conclusion

Prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
US 8099277 B2 discloses a speech duration detector detecting a starting end of a first duration and a trailing end candidate detecting unit that detects a candidate point as trailing end of the speech duration, where when the detected duration is possibly extemporaneous noise, the detected starting end and trailing end positions are thereby canceled. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to examiner Richard Z. Zhu whose telephone number is 571-270-1587 or examiner’s supervisor King Poon whose telephone number is 571-272-7440. Examiner Richard Zhu can normally be reached on M-Th, 0730:1700.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/RICHARD Z ZHU/Primary Examiner, Art Unit 2675                                                                                                                                                                                                        08/18/2022