DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Introduction
This office action is in response to communications filed on 05/17/2021. Claims 1-18 are pending and likewise Claims 1-18 have been examined.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 02/07/2022 and 08/29/2022 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1, 6, 7, 12, 13 and 18 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Aleksic et al. (US 20170069309 A1).
Regarding Claim 1:
Aleksic teaches a voice processing method, comprising: collecting a voice signal by a microphone of an electronic device, and signal-processing the collected voice signal to obtain a first voice frame segment(Para [0034], Ln 4-16, receive the audio input 202 through a microphone…… The computing device 210 receives the audio input 202 and may sample the audio input 202 at a pre-designated frequency…The computing device 210 may record and store the audio input 202 corresponding to the utterance); 
5performing voice recognition on the first voice frame segment to obtain a first recognition result(Para [0004], Ln 7-8, Intermediate speech recognition results may be returned by a speech recognizer); 
in response to the first recognition result not matching a target content(Para [0005], Ln 5-10, If a match is not determined … or a partial match is determined between an intermediate speech recognition result and context data)
and a plurality of tokens in the first recognition result meeting a preset condition(Para [0044] Ln 6-12, recognition result…“Cha-”, then based on the context data indicating that the expected….one of “Bill”, “Charlie”, and “Hillary”, it can be determined that the intermediate speech recognition result corresponds to the expected speech recognition result of “Charlie”. Para [0044],  Ln 12-20, the intermediate speech recognition result may have a particular confidence level… may contribute to determining whether the intermediate speech recognition result corresponds to the expected speech recognition result. Para [0032], Ln 8-13, using client provided information as context data, the audio is expected to correspond to a phrase “from [Origin_Airport] to [Destination_Airport]” and the EOS timeout may be extended if only a partial match to the expected phrase is received)
performing frame compensation on the first voice frame segment to obtain a second voice frame segment(Para [0005], Ln 5-10, If a match is not determined … or a partial match is determined between an intermediate speech recognition result and context data, the speech endpointer may... allow for additional audio data to be input. Para [0029], Ln 4-9, user says “19” followed by a pause….return the recognition result of “19” …determine that it is a partial match to context data. In response…EOS timeout may be extended for the user to speak the remaining portion of the year); 
and 10performing voice recognition on the second voice frame segment to obtain a second recognition result, wherein a matching degree between the second recognition result and the target content is greater than a matching degree between the first recognition result and the target content(Para [0032], Ln 15-23, receiving only “From JFK” as an example of an intermediate result, the system waits for the user to finish saying the expected phrase….When the result in the form of the expected phrase “from [Origin_Airport] to [Destination_Airport]” is received, the speech endpointer 110 will terminate the recognition session. Matching expected phrase has a higher degree of matching than a partial match).

Regarding Claim 6:
Aleksic teaches the method of claim 1, wherein a unit of each token of the plurality of tokens comprises at least one of a word, a phone, a monophone and a triphone(Para [0042], Ln 9-11, intermediate speech recognition result may include multiple syllables of a word, multiple phonetic sounds, or one or more words of a phrase).

	Regarding Claim 7:
	Aleksic teaches an electronic device, comprising: a microphone configured to collect a voice signal(Para [0034], Ln 4-16, receive the audio input 202 through a microphone…… The computing device 210 receives the audio input 202 and may sample the audio input 202 at a pre-designated frequency…The computing device 210 may record and store the audio input 202 corresponding to the utterance); 
a processor; and 15a memory storing instructions that when executed by the processor(Para [0058], Ln 1-3, processor for performing actions in accordance with instructions. Para [0059], Ln 6-9, storing computer program instructions and data include all forms of non-volatile memory), 
control the processor to signal-process the collected voice signal to obtain a first voice frame segment(Para [0034], Ln 4-16, receive the audio input 202 through a microphone…… The computing device 210 receives the audio input 202 and may sample the audio input 202 at a pre-designated frequency…The computing device 210 may record and store the audio input 202 corresponding to the utterance); 
perform voice recognition on the first voice frame segment to obtain a first recognition result(Para [0004], Ln 7-8, Intermediate speech recognition results may be returned by a speech recognizer); 
in response to the first recognition result not matching a target content(Para [0005], Ln 5-10, If a match is not determined … or a partial match is determined between an intermediate speech recognition result and context data) 
and a plurality of tokens in the first recognition result meeting a preset condition(Para [0044] Ln 6-12, recognition result…“Cha-”, then based on the context data indicating that the expected….one of “Bill”, “Charlie”, and “Hillary”, it can be determined that the intermediate speech recognition result corresponds to the expected speech recognition result of “Charlie”. Para [0044],  Ln 12-20, the intermediate speech recognition result may have a particular confidence level… may contribute to determining whether the intermediate speech recognition result corresponds to the expected speech recognition result. Para [0032], Ln 8-13, using client provided information as context data, the audio is expected to correspond to a phrase “from [Origin_Airport] to [Destination_Airport]” and the EOS timeout may be extended if only a partial match to the expected phrase is received), 
perform frame compensation on the first 20voice frame segment to obtain a second voice frame segment(Para [0005], Ln 5-10, If a match is not determined … or a partial match is determined between an intermediate speech recognition result and context data, the speech endpointer may... allow for additional audio data to be input. Para [0029], Ln 4-9, user says “19” followed by a pause….return the recognition result of “19” …determine that it is a partial match to context data. In response…EOS timeout may be extended for the user to speak the remaining portion of the year); 
and perform voice recognition on the second voice frame segment to obtain a second recognition result, wherein a matching degree between the second recognition result and the target content is greater than a matching degree between the first recognition result and the target content(Para [0032], Ln 15-23, receiving only “From JFK” as an example of an intermediate result, the system waits for the user to finish saying the expected phrase….When the result in the form of the expected phrase “from [Origin_Airport] to [Destination_Airport]” is received, the speech endpointer 110 will terminate the recognition session. Matching expected phrase has a higher degree of matching than a partial match).

Regarding Claim 12:
Claim 12 contains similar limitations as Claim 6, and is therefore rejected for the same reasons.

Regarding Claim 13:
Aleksic teaches a non-transitory computer-readable storage medium having instructions stored thereon, when the instructions are executed by a processor in an electronic device, a voice processing method is implemented, the method comprising(Para [0058], Ln 1-3, processor for performing actions in accordance with instructions. Para [0059], Ln 6-12, storing computer program instructions and data include all forms of non-volatile memory media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM): 
collecting a voice signal by a microphone of an electronic device, and signal-processing the collected voice signal to obtain a first voice frame segment(Para [0034], Ln 4-16, receive the audio input 202 through a microphone…… The computing device 210 receives the audio input 202 and may sample the audio input 202 at a pre-designated frequency…The computing device 210 may record and store the audio input 202 corresponding to the utterance); 
5performing voice recognition on the first voice frame segment to obtain a first recognition result(Para [0004], Ln 7-8, Intermediate speech recognition results may be returned by a speech recognizer); 
in response to the first recognition result not matching a target content(Para [0005], Ln 5-10, If a match is not determined … or a partial match is determined between an intermediate speech recognition result and context data)
and a plurality of tokens in the first recognition result meeting a preset condition(Para [0044] Ln 6-12, recognition result…“Cha-”, then based on the context data indicating that the expected….one of “Bill”, “Charlie”, and “Hillary”, it can be determined that the intermediate speech recognition result corresponds to the expected speech recognition result of “Charlie”. Para [0044],  Ln 12-20, the intermediate speech recognition result may have a particular confidence level… may contribute to determining whether the intermediate speech recognition result corresponds to the expected speech recognition result. Para [0032], Ln 8-13, using client provided information as context data, the audio is expected to correspond to a phrase “from [Origin_Airport] to [Destination_Airport]” and the EOS timeout may be extended if only a partial match to the expected phrase is received)
performing frame compensation on the first voice frame segment to obtain a second voice frame segment(Para [0005], Ln 5-10, If a match is not determined … or a partial match is determined between an intermediate speech recognition result and context data, the speech endpointer may... allow for additional audio data to be input. Para [0029], Ln 4-9, user says “19” followed by a pause….return the recognition result of “19” …determine that it is a partial match to context data. In response…EOS timeout may be extended for the user to speak the remaining portion of the year); 
and 10performing voice recognition on the second voice frame segment to obtain a second recognition result, wherein a matching degree between the second recognition result and the target content is greater than a matching degree between the first recognition result and the target content(Para [0032], Ln 15-23, receiving only “From JFK” as an example of an intermediate result, the system waits for the user to finish saying the expected phrase….When the result in the form of the expected phrase “from [Origin_Airport] to [Destination_Airport]” is received, the speech endpointer 110 will terminate the recognition session. Matching expected phrase has a higher degree of matching than a partial match).

Regarding Claim 18:
Claim 18 contains similar limitations as Claim 6, and is therefore rejected for the same reasons.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 2, 8 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aleksic as applied to claim 1 above, and further in view of Hoffmeister et al. (US 20160379632 A1).

Regarding Claim 2:
Aleksic teaches the method of claim 1, but does not teach wherein the plurality of tokens in the first recognition result meeting the preset condition comprises: 15at least one token in the first recognition result whose matching probability is greater than a first set threshold or a number of tokens in the first recognition result whose matching probability to the target content is greater than a second set threshold exceeding a preset number.
In the same field of Endpointing, Hoffmeister teaches wherein the plurality of tokens in the first recognition result meeting the preset condition comprises: 15at least one token in the first recognition result whose matching probability is greater than a first set threshold(Hoffmeister, Para [0024], Ln 5-17, The confidence score may be based on…and the likelihood that a particular word which matches the sounds would be included in the sentence at the specific location…..Based on … confidence score, the ASR process 250 outputs the most likely text recognized in the audio data .Para [0066], Ln 8-10, Thus, active hypotheses may have a confidence score that is above a certain threshold. Hofffmeister specifically considers active hypotheses in endpoint detection(Abstract, Ln 1-3)); 
or a number of tokens in the first recognition result whose matching probability to the target content is greater than a second set threshold exceeding a preset number(optional limitation).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify Aleksic, with the ASR confidence threshold of Hoffmeister, as it improves efficiency(Para [0066], Ln 1-4).

	Regarding Claim 8:
Claim 8 contains similar limitations as Claim 2 and is therefore rejected for the same reasons.

Regarding Claim 14:
Claim 14 contains similar limitations as Claim 2 and is therefore rejected for the same reasons.


Claim(s) 3, 4, 9, 10, 15 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aleksic as applied to claim 1 above, and further in view of Minkin et al. (US 11145305 B2).

Regarding Claim 3:
Aleksic teaches the method of claim 1, wherein performing frame compensation on the first voice frame 20segment to obtain the second voice frame segment comprises: estimating a frame length compensated for the first voice frame segment according to a length of historical target content counted(Para [0025], Ln 2-7, partial match is determined …. For example, EOS timeout may be extended by a predetermined amount of time in association with an expected speech recognition result);
and determining a next voice frame segment in the collected voice signal adjacent to the first voice frame segment(Para [0008], Ln 3-6, determining a time period for the end of speech condition to receive additional audio data in response to determining the intermediate speech recognition result includes a partial match or no match), 
obtaining the frame length of a third voice frame segment from a start position of 25the next voice frame segment(Para [0046], Ln 12-17, expected speech recognition result is one of several three word phrases and the intermediate speech recognition result contains only two words, then EOS timeout may be extended by a predetermined amount of time to allow additional time for the input of audio of the third word).
Aleksic does not specifically teach and splicing the third voice frame segment behind the first voice frame segment to form the second voice frame segment.
In the same field of Endpointing, Minkin teaches and splicing the third voice frame segment behind the first voice frame segment to form the second voice frame segment(Col 3, Ln 17-34, responsive to the first value being below the pre-determined threshold, determining, ….features for a second segment of the in-use digital audio signal. The second segment is sequential to the first segment in the in-use digital audio signal…. features comprises at least acoustic-type features derivable from the second segment of the digital audio signal…determine based on the first set of features and the second set of features a second value indicative of a probability of the in-use user utterance having ended during the second segment of the in-use digital audio signal. First and second acoustic features are used together, second is sequential to the first).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify Aleksic with the endpointing of Minkin, as it  increases responsiveness, improving user experience(Col 8, Ln 5-10).

Regarding Claim 4:
Aleksic teaches the method of claim 1, wherein performing frame compensation on the first voice frame segment to obtain the second voice frame segment comprises: determining a next voice frame segment in the collected voice signal adjacent to the first voice 30frame segment(Para [0008], Ln 3-6, determining a time period for the end of speech condition to receive additional audio data in response to determining the intermediate speech recognition result includes a partial match or no match), 
obtaining a set frame length of a fourth voice frame segment from a start position of 15Docket No. 718464 the next voice frame segment(Para [0046], Ln 12-17, expected speech recognition result is one of several three word phrases and the intermediate speech recognition result contains only two words, then EOS timeout may be extended by a predetermined amount of time to allow additional time for the input of audio of the third word).
Aleksic does not specifically teach and splicing the fourth voice frame segment behind the first voice frame segment to form the second voice frame segment.
In the same field of Endpointing, Minkin teaches and splicing the fourth voice frame segment behind the first voice frame segment to form the second voice frame segment(Col 3, Ln 17-34, responsive to the first value being below the pre-determined threshold, determining, ….features for a second segment of the in-use digital audio signal. The second segment is sequential to the first segment in the in-use digital audio signal…. features comprises at least acoustic-type features derivable from the second segment of the digital audio signal…determine based on the first set of features and the second set of features a second value indicative of a probability of the in-use user utterance having ended during the second segment of the in-use digital audio signal. First and second acoustic features are used together, second is sequential to the first).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify Aleksic with the endpointing of Minkin, as it  increases responsiveness, improving user experience(Col 8, Ln 5-10).

Regarding Claim 9:
Claim 9 contains similar limitations as Claim 3, and is therefore rejected for the same reasons.

Regarding Claim 10:
Claim 10 contains similar limitations as Claim 4, and is therefore rejected for the same reasons.

Regarding Claim 15:
Claim 15 contains similar limitations as Claim 3, and is therefore rejected for the same reasons.

Regarding Claim 16:
Claim 16 contains similar limitations as Claim 4, and is therefore rejected for the same reasons.

Claim(s) 5, 11 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aleksic as applied to claim 1 above, and further in view of Muchlinski et al. (US 10714122 B2).

Regarding Claim 5:
Aleksic teaches the method of claim 1, but does not teach wherein performing voice recognition on the first voice frame segment comprises: 5detecting whether a phoneme is a filler phoneme and/or a silent phoneme starting from the first phoneme of the first voice frame segment, skipping the phoneme and performing voice recognition on the first voice frame segment when it is determined that a probability of the phoneme being the filler phoneme and/or the silent phoneme is greater than a probability of the phoneme being neither a filler phoneme nor a silent phoneme.
In the same field of endpointing, Muchlinski teaches wherein performing voice recognition on the first voice frame segment comprises: 5detecting whether a phoneme is a filler phoneme and/or a silent phoneme starting from the first phoneme of the first voice frame segment(Col 3, Ln 27-32, For each feature vector or audio frame, the probability scores are provided as an output of the acoustic model. Each probability score indicates a probability (e.g., a probability density function score) that the feature vector or audio frame corresponds to a particular audio unit that may correspond to silence, noise, a unit of speech. Col 4, Ln 41-46, For example, when the first speech state exceeds the first non-speech state, a begin of speech is detected. The begin of speech time stamp may also be provided, when a subsequent speech detection is made as discussed above, to a subsequent audio processing unit), 
skipping the phoneme and performing voice recognition on the first voice frame segment(Col 5, Ln 17-19, performing speech detection only on the temporal portion of received audio corresponding to the detected speech signal)
when it is determined that a probability of the phoneme being the filler phoneme and/or the silent phoneme is greater than a probability of the phoneme being neither a filler phoneme nor a silent phoneme(Col 5, Ln 6-7, Furthermore, device 102 may provide other capabilities in response to speech being detected. Col 5, Ln 17-19, performing speech detection only on the temporal portion of received audio corresponding to the detected speech signal. Col 4, Ln 41-46, For example, when the first speech state exceeds the first non-speech state, a begin of speech is detected. The begin of speech time stamp may also be provided, when a subsequent speech detection is made as discussed above, to a subsequent audio processing unit).
	It would have been obvious for one skilled in the art, at the effective time of filling, to modify Aleksic, with the speech start point detection of Muchlinski, in order to provide always listening capability, improving user convenience(Col 4, Ln 50-52, & Col 4, Ln 58-65).

Regarding Claim 11:
Claim 11 contains similar limitations as Claim 5, and is therefore rejected for the same reasons.

Regarding Claim 17:
Claim 17 contains similar limitations as Claim 5, and is therefore rejected for the same reasons.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

Liu et al. (US 9437186 B1)
Endpoint detection using natural language understanding.
	Tadpatrikar et al. (US 11062696 B2)
Endpoint detection and timeout length determination.
	Gejji (US 11056098 B1)
Endpoint detection.
	Gian (US 11024332 B2)
Endpoint detection using natural language understanding.
	Maas (US 10854192 B1)
Endpoint detection using natural language understanding.
Chang et al. (US 20200335091 A1)
Endpoint detection using natural language understanding.
	Hofer et al. (US 10657952 B2)
Endpoint detection using natural language understanding.
	Chang et al. (US 20200117996 A1)
Endpoint detection using natural language understanding.
	Doshi et al. (US 20190318759 A1)
Endpoint detection using natural language understanding.
	Williams et al. (US 20160358598 A1)
Endpoint detection using natural language understanding.
	Buchanan et al. (US 20150310879 A1)
Endpoint detection using word matching.
Hwang et al. “End-to-End Speech Endpoint Detection Utilizing Acoustic and Language Modeling Knowledge for Online Low-Latency Speech Recognition”
Endpoint detection using natural language understanding.
	Liu et al. “Accurate Endpointing with Expected Pause Duration”
Endpoint detection using natural language understanding.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALEXANDER G MARLOW whose telephone number is (571)272-4536. The examiner can normally be reached Monday - Thursday 10:00 am - 8:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richmond Dorvil can be reached on (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ALEXANDER G MARLOW/Assistant Examiner, Art Unit 2658                                                                                                                                                                                                        

/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658