DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority Acknowledgment
2.               Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in Application 107109671 on 03/21/2018 in the Taiwan Patent Office. 

Claim Objections
3.	Claim 9 is objected to because of the following informalities: typographical error. Claim 9 recites the limitation of “109. A computer program product, is stored in a computer readable medium to be read and executed to achieve the method as claimed in claim 1.” Claim 9 should be changed to “109. A computer program product,   Appropriate correction is required.

Claim Rejections - 35 USC § 101
4.	35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. 

5.	Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. Claim 9 is directed to “9. A computer program product, is stored in a computer readable medium to be read and executed to achieve the method as claimed in claim 1.” However, the recitation of the medium in the specification is not exclusory with respect to non-statutory medium types (Specification [0020] a computer program product may be stored in a computer-readable medium to be read and executed to achieve the functions of the present invention, but the present invention is not limited to the above manner.) Additionally, variations of the term “stored” are not necessarily considered to limit a media claim to non-transitory embodiments because content may be considered to be stored on a signal during propagation and because many disclosures conflate storage media and signals.  
Thus, under the broadest reasonable interpretation, the claim(s) as a whole would include non-statutory mediums such as carrier waves. 	As per the USPTO notice signed by director David Kappos on 1/26/2010: “The United States Patent and Trademark Office (USPTO) is obliged to give claims their broadest reasonable interpretation consistent with the specification during proceedings before the USPTO.” See In re Zletz, 893 F.2d 319(Fed. Cir. 1989) (during patent examination the pending claims must be interpreted as broadly as their terms reasonably allow). The broadest reasonable interpretation of a claim drawn to a computer readable medium (also called machine readable medium and other such variations) typically covers forms of non-transitory tangible media and transitory propagating signals per se in view of the ordinary and customary meaning of computer readable media, particularly when the specification is silent. See MPEP 2111.01. When the broadest reasonable interpretation of a claim covers a signal per se, the claim must be rejected under 35 U.S.C. 101 as covering non-statutory subject matter. See In re Nuijten, 500 F.3d 1346, 1356-57 (Fed. Cir. 2007) and Interim Examination Instructions for Evaluating Subject Matter Eligibility Under 35 U.S.C. 101, Aug. 24, 2009; p. 2.  	The claims as a whole therefore include(s) signal-based mediums. A signal does not fall within one of the four statutory categories of invention (i.e., process, machine, manufacture, or composition of matter) because it is an ephemeral, transient signal and thus is non-statutory. Since the claims as a whole include these non-statutory instances, Claim 9 is directed to non-statutory subject matter.

Claim Rejections - 35 USC § 112
6.	The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

7.	Claims 5-7, 14-16 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention. 
	Claim 5 recites the limitation of “the sound energy values before and after”. There is insufficient antecedent basis for this limitation in the claim. Besides, it is not clear what it is before and after. 
	Claim 5 recites the limitation of “if the sound energy comparison value is less than the end threshold, determining whether the sound energy comparison value is not greater than the start threshold within a time interval and is not less than the end threshold;” In this conditional sentence, the main clause clearly recites “the sound energy comparison value is less than the end threshold”, however “determining whether the sound energy comparison value is not less than the end threshold” is a part of the dependent clause. “the sound energy comparison value is less than the end threshold” is already confirmed in the main clause, why in the dependent clause this value is determined whether it is not less than the end threshold. This conditional sentence is incompatible. 
	Claim 5 recites “if both are not greater than the start threshold and not less than the end threshold, it is determined that the voice has ended.” It is not clear what “both” is the claimed language refers back to. Besides, only one value which is “the sound energy comparison value” comparing with the start threshold and the end threshold, why “both” is used in the claim. 
	Claim 14 has the similar issue as claim 5. Claim 14 recites the limitations of “if it is less than the end threshold, the near-end processing module determining whether the sound energy comparison value is not greater than the start threshold within a time interval and is not less than the end threshold; if both are not greater than the start threshold and not less than the end threshold, then the near-end processing module determining that the voice has ended.”



Claim Rejections - 35 USC § 103
8.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

9.	Claims 1, 9, 10 are rejected under 35 U.S.C.103 as being unpatentable over 
Secker-Walker et al. (US 9818407 B1) in view of Koya (US 2016/0125883 A1.)

	With respect to Claim 1, Secker-Walker et al. disclose
 	An artificial intelligence voice interaction method for a user to employ a near-end electronic device and can be fulfilled by a remote artificial intelligence server (Secker-Walker et al. Fig. 1), the method comprising the following steps: 
	receiving a voice input by the user (Secker-Walker et al. Fig. 3 element 304 Receive audio input); 
 	transmitting the voice to the remote artificial intelligence server (Secker-Walker et al. Fig. 3 element 316 transmit audio, col. 7 lines 8-14 If the voice activity is directed to the device, or wakeword is detected, in block 312 (and optionally the speech is associated with the particular user), the illustrate routine 300 may proceed to block 314 and trigger the network interface module 212. With the network interface module 212 triggered, the audio input recorded to the memory buffer module 208 may be transmitted 316 over the network 106); 
 	determining whether the voice has ended (Secker-Walker et al. col. 1 lines 56-61 It may be desirable for devices to indicate to each other when a particular audio stream starts to contain speech for processing and when speech for processing ends (a mechanism sometimes referred to as “endpointing”, col. 6 lines 25-28 The local device 102 may use speech recognition concepts running as computer program instructions on a processing unit in the local device 102 to implement endpointing in the device-side processing, Fig. 2 element 204 Endpointing/Audio Detection); 
 	before determining that the voice has ended, and has received the stop recording signal from the remote artificial intelligence server, it stops 15transmitting the voice to the remote artificial intelligence server (Secker-Walker et al. col. 2 lines 24-31 The server process 104 receives the speech audio stream 114 and monitors the audio, implementing endpointing in the server process 104, to determine when to tell the client process 100 to close the connection and stop streaming audio, col. 2 lines 51-55 If one or a combination of disconnect criteria are satisfied then the server process 104 tells the client process 100 to close 124 the connection and the speech audio stream from the local device 102 is stopped); and 
 	receiving a response signal sent back from the remote artificial intelligence server (Secker-Walker et al. col. 5 lines 55-57 A remote server 105 may return recognition results (e.g., a transcription or response to an intelligent agent query) to the local device 102.)
	Secker-Walker et al. further teach that the if the server does not tell the local device to close the connection, local device may close the connection if some “failsafe” criteria has been met (Secker-Walker et al. col. 2 lines 42-48 If the server does not tell the local device to close the connection, the local device 102 may close the connection if the some “failsafe” criteria has been met 123 such as expiration of a selected period of time, incomprehensibility of speech signal, or physical user interface on the local device (e.g. button). It should be appreciated that other disconnect and/or failsafe criteria may be defined, col. 11 lines 36-38 The client process 100 continues streaming audio and the serer 103 continues receiving the buffered audio 508 until disconnect criteria or failsafe condition are met.) The method in Secker-Walker et al. let the local device based on some “failsafe” criteria determined at the local device if the server does not tell the local device to close the connection. These “failsafe” criteria does not include the voice has ended. 
	Secker-Walker et al. fail to explicitly teach
 	when determining that the voice has ended and has not received a stop recording signal transmitted by the remote artificial intelligence server, it stops transmitting the voice to the remote artificial intelligence server;
	However, Koya teaches 
When a mute period lasts for a prescribed threshold or longer, speech recognition processing unit 80 deems the utterance to be terminated, and outputs an end-of-utterance detection signal. Receiving the end-of-utterance detection signal, determining unit 82 issues an instruction towards communication control unit 86 to end transmission of data to speech recognition server 36. The Examiner notes that in this reference, transmission of data to speech recognition server will be ended if the end-of-utterance is detected. It does not need another condition to end the transmission of data to the server);
 	Secker-Walker et al. and Koya are analogous art because they are from a similar field of endeavor in the Signal Recognition techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of determining some “failsafe” criteria at local device if the server does not tell the local device to close the connection as taught by Secker-Walker et al., using teaching of detecting the end-of-utterance at the local device as taught by Koya for the benefit of ending the transmission of data to the speech recognition server (Koya [0042] When a mute period lasts for a prescribed threshold or longer, speech recognition processing unit 80 deems the utterance to be terminated, and outputs an end-of-utterance detection signal. Receiving the end-of-utterance detection signal, determining unit 82 issues an instruction towards communication control unit 86 to end transmission of data to speech recognition server 36.)

With respect to Claim 9, because claim 9 recites the limitations of  “10A computer program product, is stored in a computer readable medium to be read and executed to achieve the method as claimed in claim 1.” Thus, claim 9 is rejected under the same rejected as the same ground as claim 1. 

	With respect to Claim 10, Secker-Walker et al. disclose
 	A near-end electronic device, used by a user and connected to a remote artificial intelligence server via a network, the near-end electronic device comprising: 
The audio signal from the microphone are received by a Digital Signal Processing (DSP) 202 for processing, Fig. 3 element 304 Receive audio input); 
 	a transmission module, which is electrically connected to the microphone for transmitting the voice to the remote artificial intelligence server (Secker-Walker et al. Fig. 3 element 316 transmit audio, col. 7 lines 8-14 If the voice activity is directed to the device, or wakeword is detected, in block 312 (and optionally the speech is associated with the particular user), the illustrate routine 300 may proceed to block 314 and trigger the network interface module 212. With the network interface module 212 triggered, the audio input recorded to the memory buffer module 208 may be transmitted 316 over the network 106);  
a near-end processing module, which is electrically connected to the transmission module for determining whether the voice has ended (Secker-Walker et al. col. 1 lines 56-61 It may be desirable for devices to indicate to each other when a particular audio stream starts to contain speech for processing and when speech for processing ends (a mechanism sometimes referred to as “endpointing”, col. 6 lines 25-28 The local device 102 may use speech recognition concepts running as computer program instructions on a processing unit in the local device 102 to implement endpointing in the device-side processing, Fig. 2 element 204 Endpointing/Audio Detection);  
	when 5before the near-end processing module determining that the voice has ended, and has received the stop recording signal from the remote artificial intelligence server, it stops transmitting the voice to the remote artificial intelligence server (Secker-Walker et al. col. 2 lines 24-31 The server process 104 receives the speech audio stream 114 and monitors the audio, implementing endpointing in the server process 104, to determine when to tell the client process 100 to close the connection and stop streaming audio, col. 2 lines 51-55 If one or a combination of disconnect criteria are satisfied then the server process 104 tells the client process 100 to close 124 the connection and the speech audio stream from the local device 102 is stopped); and 
 	a voice module, which is electrically connected to the near-end 10processing module for emitting a response signal sent back from the remote artificial intelligence server (Secker-Walker et al. col. 5 lines 55-57 A remote server 105 may return recognition results (e.g., a transcription or response to an intelligent agent query) to the local device 102.)
If the server does not tell the local device to close the connection, the local device 102 may close the connection if the some “failsafe” criteria has been met 123 such as expiration of a selected period of time, incomprehensibility of speech signal, or physical user interface on the local device (e.g. button). It should be appreciated that other disconnect and/or failsafe criteria may be defined, col. 11 lines 36-38 The client process 100 continues streaming audio and the serer 103 continues receiving the buffered audio 508 until disconnect criteria or failsafe condition are met.) The method in Secker-Walker et al. let the local device based on some “failsafe” criteria determined at the local device if the server does not tell the local device to close the connection. These “failsafe” criteria does not include the voice has ended. 
	Secker-Walker et al. fail to explicitly teach
21wherein when the near-end processing module determining that the voice has ended and has not received a stop recording signal transmitted by the remote artificial intelligence server, it stops transmitting the voice to the remote artificial intelligence; and 
 	However, Koya teaches 
 	21wherein when the near-end processing module determining that the voice has ended and has not received a stop recording signal transmitted by the remote artificial intelligence server, it stops transmitting the voice to the remote artificial intelligence; and (Koya [0042] When a mute period lasts for a prescribed threshold or longer, speech recognition processing unit 80 deems the utterance to be terminated, and outputs an end-of-utterance detection signal. Receiving the end-of-utterance detection signal, determining unit 82 issues an instruction towards communication control unit 86 to end transmission of data to speech recognition server 36. The Examiner notes that in this reference, transmission of data to speech recognition server will be ended if the end-of-utterance is detected. It does not need another condition to end the transmission of data to the server);
 	Secker-Walker et al. and Koya are analogous art because they are from a similar field of endeavor in the Signal Recognition techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of determining some “failsafe” criteria at local device if the server does not When a mute period lasts for a prescribed threshold or longer, speech recognition processing unit 80 deems the utterance to be terminated, and outputs an end-of-utterance detection signal. Receiving the end-of-utterance detection signal, determining unit 82 issues an instruction towards communication control unit 86 to end transmission of data to speech recognition server 36.)

10.	Claims 2, 4, 5, 11, 13, and 14 are rejected under 35 U.S.C.103 as being unpatentable over Secker-Walker et al. (US 9818407 B1) in view of Koya (US 2016/0125883 A) and Fanty et al. (US 2013/0132089 A1). 

	With respect to Claim 2, Secker-Walker et al. in view of Koya teach
 	further comprising the following steps:  
recording the voice first before transmitting the voice to the remote artificial intelligence server (Secker-Walker et al. col. 5 lines 50-54 Upon its activation, the network interface module 212 may transmit the received audio input recorded to the memory buffer module 298 over the network 106 to the memory buffer module 208 over the network 106 to the remote server 105); and 
Secker-Walker et al. in view of Koya fail to explicitly teach 
stopping recording the voice before stopping transmitting the voice to the remote artificial intelligence server.  
However, Fanty et al. teach 
stopping recording the voice before stopping transmitting the voice to the remote artificial intelligence server (Fanty et al. Fig. 3 elements 330 Halt audio recording, 332 Transmit remaining compressed audio to network ASR, [0054] after audio is received by the client device in act 312, the process ...both the input and output buffers of a DSP included in the device...the compressed audio may be stored in an output buffer of the DSP.)
Secker-Walker et al., Koya and Fanty et al. are analogous art because they are from a similar field of endeavor in the Signal Recognition techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the if it is determined in act 328 that the end of speech has been detected, process control proceeds to act 330 where encoding of input audio by the client device is halted. Halting the encoding of input audio may avoid sending unnecessary audio data that includes only silence over the network.)

With respect to Claim 4, Secker-Walker et al. in view of Koya and Fanty et al. teach 
 	wherein when a sound energy comparison value is greater than a start threshold, the step of recording the voice is performed (Secker-Walker et al. col. 3 lines 34-37 the endpointing/audio detection module 204 may be further configured to determine that the audio input has an energy level satisfying a threshold for at least a threshold duration of time, col. 4 lines 65-67 and col. 5 lines 1-3 If the endpointing/audio detection module 204 determines a confidence level whose value corresponds to a likelihood that speech is actually present in the audio input, the audio stream is input from the DSP 202 to the buffer 208 in the first stage of speech detection.)

	With respect to Claim 5, Secker-Walker et al. in view of Koya and Fanty et al.
 	wherein the step of determining whether the voice has ended includes: 
 	calculating a sound energy value (Secker-Walker et al. col. 3 lines 25-26 Audio detection processing may be performed to determine an energy value of the audio input); 
 	comparing the sound energy values before and after to obtain the sound energy comparison value (Secker-Walker et al. col. 3 lines 34-37 The endpointing/audio detection module 204 may be further configured to determine that the audio inputs has an energy level satisfying a threshold for at least a threshold duration of time. Secker-Walker et al. keep tracking the energy value in the duration of time. It implies that more than one value of energy are calculated);  
the endpointing /audio detection module 204 may determine whether voice activity is detected, such as by determining whether the audio input has an energy level that satisfies an energy level threshold (and, optionally, whether the audio input has an energy level that satisfies an energy level threshold for at least a threshold duration). If the audio input’s energy level does not satisfy the energy level threshold, the audio input module 208 may continue to monitor for speech audio input in block 310 until another audio input is received); 
 	if the sound energy comparison value is less than the end threshold, determining whether the sound energy comparison value is not greater than the start threshold within a time interval and is not less than the end threshold (Secker-Walker et al. col. 2 lines 38 Such disconnect criteria may include a determination that speech in the audio stream has stopped 116, an end of interaction indication (EOII) 118 or “sleepword,” a reduction in the energy level of the speech 120, or expiration of a period of time); and 
if both are not greater than the start threshold and not less than the end threshold, it is determined that the voice has ended (Secker-Walker et al. col. 4 lines 13-16 if the confidence level does not satisfy the confidence level the endpointing/ audio detection module 204 may determine that there is no speech in the audio input, col. 6 lines 42-53 At block 308, the endpointing /audio detection module 204 may determine whether voice activity is detected, such as by determining whether the audio input has an energy level that satisfies an energy level threshold (and, optionally, whether the audio input has an energy level that satisfies an energy level threshold for at least a threshold duration). If the audio input’s energy level does not satisfy the energy level threshold, the audio input module 208 may continue to monitor for speech audio input in block 310 until another audio input is received. As indicated above in 112(b) rejection, the conditional limitation is indefinite. For prosecution purpose, the Examiner interprets the limitation in claim 5 in light of paragraphs [0057] of the present specification. Paragraph [0057] disclose “When the near-end processing module 13 determines that the sound energy comparison value is less than the end threshold, the near-end processing module 13 determines that the voice may have ended.”  

 	With respect to Claim 11, Secker-Walker et al. in view of Koya teach

 	before transmitting the voice to the remote artificial intelligence server, the near-end processing module first records the voice (Secker-Walker et al. col. 5 lines 50-54 Upon its activation, the network interface module 212 may transmit the received audio input recorded to the memory buffer module 298 over the network 106 to the memory buffer module 208 over the network 106 to the remote server 105);
Secker-Walker et al. in view of Koya fail to explicitly teach 
before stopping transmitting the voice to the remote artificial intelligence server, the near-end processing module stops recording the voice first.  
However, Fanty et al. teach 
before stopping transmitting the voice to the remote artificial intelligence server, the near-end processing module stops recording the voice first (Fanty et al. Fig. 3 elements 330 Halt audio recording, 332 Transmit remaining compressed audio to network ASR, [0054] after audio is received by the client device in act 312, the process ...both the input and output buffers of a DSP included in the device...the compressed audio may be stored in an output buffer of the DSP.)
Secker-Walker et al., Koya and Fanty et al. are analogous art because they are from a similar field of endeavor in the Signal Recognition techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of determining some “failsafe” criteria at local device if the server does not tell the local device to close the connection as taught by Secker-Walker et al., using teaching of detecting the end-of-utterance at the local device as taught by Koya for the benefit of ending the transmission of data to the speech recognition server, using teaching of halting the audio recording as taught by Fanty et al. for the benefit of avoiding sending unnecessary audio data that includes only silence over the network (Fanty et al. [0064] if it is determined in act 328 that the end of speech has been detected, process control proceeds to act 330 where encoding of input audio by the client device is halted. Halting the encoding of input audio may avoid sending unnecessary audio data that includes only silence over the network.)

With respect to Claim 13, Secker-Walker et al. in view of Koya and Fanty et al. teach 
the endpointing/audio detection module 204 may be further configured to determine that the audio input has an energy level satisfying a threshold for at least a threshold duration of time, col. 4 lines 65-67 and col. 5 lines 1-3 If the endpointing/audio detection module 204 determines a confidence level whose value corresponds to a likelihood that speech is actually present in the audio input, the audio stream is input from the DSP 202 to the buffer 208 in the first stage of speech detection.)

 	With respect to Claim 14, Secker-Walker et al. in view of Koya and Fanty et al.
 	5further comprising a sound energy calculation module for calculating a sound energy and a previous sound energy to obtain a sound energy comparison value (Secker-Walker et al. col. 3 lines 34-37 The endpointing/audio detection module 204 may be further configured to determine that the audio inputs has an energy level satisfying a threshold for at least a threshold duration of time.), the near-end processing module determining whether the sound energy comparison value is less than an end 10threshold (Secker-Walker et al. col. 6 lines 42-53 At block 308, the endpointing /audio detection module 204 may determine whether voice activity is detected, such as by determining whether the audio input has an energy level that satisfies an energy level threshold (and, optionally, whether the audio input has an energy level that satisfies an energy level threshold for at least a threshold duration). If the audio input’s energy level does not satisfy the energy level threshold, the audio input module 208 may continue to monitor for speech audio input in block 310 until another audio input is received. Secker-Walker et al. keep tracking the energy value in the duration of time. It implies that more than one value of energy are calculated); 
 	if it is less than the end threshold, the near-end processing module determining whether the sound energy comparison value is not greater than the start threshold within a time interval and is not less than the end threshold (Secker-Walker et al. col. 2 lines 38 Such disconnect criteria may include a determination that speech in the audio stream has stopped 116, an end of interaction indication (EOII) 118 or “sleepword,” a reduction in the energy level of the speech 120, or expiration of a period of time); 
if the confidence level does not satisfy the confidence level the endpointing/ audio detection module 204 may determine that there is no speech in the audio input, col. 6 lines 42-53 At block 308, the endpointing /audio detection module 204 may determine whether voice activity is detected, such as by determining whether the audio input has an energy level that satisfies an energy level threshold (and, optionally, whether the audio input has an energy level that satisfies an energy level threshold for at least a threshold duration). If the audio input’s energy level does not satisfy the energy level threshold, the audio input module 208 may continue to monitor for speech audio input in block 310 until another audio input is received. As indicated above in 112(b) rejection, the conditional limitation is indefinite. For prosecution purpose, the Examiner interprets the limitation in claim 5 in light of paragraphs [0057] of the present specification. Paragraph [0057] disclose “When the near-end processing module 13 determines that the sound energy comparison value is less than the end threshold, the near-end processing module 13 determines that the voice may have ended.”  

11.	Claims 3, 12 are rejected under 35 U.S.C.103 as being unpatentable over Secker-Walker et al. (US 9818407 B1) in view of Koya (US 2016/0125883 A), Fanty et al. (US 2013/0132089 A1) and Federighi et al. (US 2013/0332159 A1). 

With respect to Claim 3, Secker-Walker et al. in view of Koya and Fanty et al. teach all the limitations of claim 2 upon which Claim 3 depends. Secker-Walker et al. in view of Koya and Fanty et al. fail to explicitly teach 
wherein the step of determining whether the voice has ended includes determining whether the voice is a complete sentence. 
However, Federighi et al. teach 
 	wherein the step of determining whether the voice has ended includes determining whether the voice is a complete sentence (Federighi et al. [0038] an end of speech may be detected as a complete sentence followed by silence.)
 	Secker-Walker et al., Koya, Fanty et al. and Federighi et al. are analogous art because they are from a similar field of endeavor in the Signal Recognition techniques and applications. an end of speech may be detected as a complete sentence followed by silence.)
 
With respect to Claim 12, Secker-Walker et al. in view of Koya and Fanty et al. teach all the limitations of claim 11 upon which Claim 12 depends. Secker-Walker et al. in view of Koya and Fanty et al. fail to explicitly teach 
 	wherein the near-end processing module determines whether the voice is a complete sentence to know if the voice has ended.  
However, Federighi et al. teach 
 	wherein the near-end processing module determines whether the voice is a complete sentence to know if the voice has ended (Federighi et al. [0038] an end of speech may be detected as a complete sentence followed by silence.)
 	Secker-Walker et al., Koya, Fanty et al. and Federighi et al. are analogous art because they are from a similar field of endeavor in the Signal Recognition techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of determining some “failsafe” criteria at local device if the server does not tell the local device to close the connection as taught by Secker-Walker et al., using teaching of detecting the end-of-utterance at the local device as taught by Koya for the benefit of ending the transmission of data to the speech recognition server, using teaching of halting the audio recording as taught by Fanty et al. for the benefit of avoiding sending unnecessary audio data that includes only silence over the network, using teaching of detecting a complete sentence as taught by Federighi et al. for the benefit of detecting an end of speech may be detected as a complete sentence followed by silence.)

12.	Claims 6, 15 are rejected under 35 U.S.C.103 as being unpatentable over Secker-Walker et al. (US 9818407 B1) in view of Koya (US 2016/0125883 A), Fanty et al. (US 2013/0132089 A1) and Matsumoto (US 2008/0065381 A1). 

With respect to Claim 6, Secker-Walker et al. in view of Koya and Fanty et al. teach all the limitations of claim 5 upon which Claim 6 depends. Secker-Walker et al. in view of Koya and Fanty et al. fail to explicitly teach 
wherein the sound energy value is calculated every 0.2 sec. 
However, Matsumoto teaches 
 	wherein the sound energy value is calculated every 0.2 sec (Matsumoto [0044] From data which is normalized using a maximum power value per time frame (for example, 0.2 seconds), the voiced/unvoiced determining unit 103 determines as unvoiced, the portions that are less than or equal to the threshold value and determines as voiced, the portions that are greater than or equal to the threshold value.)
 	Secker-Walker et al., Koya, Fanty et al. and Matsumoto are analogous art because they are from a similar field of endeavor in the Signal Recognition techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of determining some “failsafe” criteria at local device if the server does not tell the local device to close the connection as taught by Secker-Walker et al., using teaching of detecting the end-of-utterance at the local device as taught by Koya for the benefit of ending the transmission of data to the speech recognition server, using teaching of halting the audio recording as taught by Fanty et al. for the benefit of avoiding sending unnecessary audio data that includes only silence over the network, using teaching of the time frame as taught by Matsumoto for the benefit of determining whether the portion is voiced or unvoiced (Matsumoto [0044] From data which is normalized using a maximum power value per time frame (for example, 0.2 seconds), the voiced/unvoiced determining unit 103 determines as unvoiced, the portions that are less than or equal to the threshold value and determines as voiced, the portions that are greater than or equal to the threshold value.)

With respect to Claim 15, Secker-Walker et al. in view of Koya and Fanty et al. teach all the limitations of claim 14 upon which Claim 15 depends. Secker-Walker et al. in view of Koya and Fanty et al. fail to explicitly teach 
 	wherein the sound energy calculation module calculates the sound energy comparison value every 0.2 sec. 
However, Matsumoto teaches 
 	wherein the sound energy calculation module calculates the sound energy comparison value every 0.2 sec (Matsumoto [0044] From data which is normalized using a maximum power value per time frame (for example, 0.2 seconds), the voiced/unvoiced determining unit 103 determines as unvoiced, the portions that are less than or equal to the threshold value and determines as voiced, the portions that are greater than or equal to the threshold value.)
 	Secker-Walker et al., Koya, Fanty et al. and Matsumoto are analogous art because they are from a similar field of endeavor in the Signal Recognition techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of determining some “failsafe” criteria at local device if the server does not tell the local device to close the connection as taught by Secker-Walker et al., using teaching of detecting the end-of-utterance at the local device as taught by Koya for the benefit of ending the transmission of data to the speech recognition server, using teaching of halting the audio recording as taught by Fanty et al. for the benefit of avoiding sending unnecessary audio data that includes only silence over the network, using teaching of the time frame as taught by Matsumoto for the benefit of determining whether the portion is voiced or unvoiced (Matsumoto [0044] From data which is normalized using a maximum power value per time frame (for example, 0.2 seconds), the voiced/unvoiced determining unit 103 determines as unvoiced, the portions that are less than or equal to the threshold value and determines as voiced, the portions that are greater than or equal to the threshold value.)

12.	Claims 7, 16 are rejected under 35 U.S.C.103 as being unpatentable over Secker-Walker et al. (US 9818407 B1) in view of Koya (US 2016/0125883 A), Fanty et al. (US 2013/0132089 A1) and Gopalan et al. (US 10,388, 298 B1). 

 	With respect to Claim 7, Secker-Walker et al. in view of Koya and Fanty et al. teach all the limitations of claim 5 upon which Claim 7 depends. Secker-Walker et al. in view of Koya and Fanty et al. fail to explicitly teach
 	wherein the time interval is 0.6 sec. 
	However, Gopalan et al teach 
5 	wherein the time interval is 0.6 sec (Gopalan et al. col. 5 lines 34-38 Around 0.6 seconds, the high band energy 316 jumps up and the difference between the low band energy 314 and the high band energy 316 decreases below the second threshold, indicating that near end speech s (t) is present.)
 	Secker-Walker et al., Koya, Fanty et al. and Gopalan et al. are analogous art because they are from a similar field of endeavor in the Signal Recognition techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of determining some “failsafe” criteria at local device if the server does not tell the local device to close the connection as taught by Secker-Walker et al., using teaching of detecting the end-of-utterance at the local device as taught by Koya for the benefit of ending the transmission of data to the speech recognition server, using teaching of halting the audio recording as taught by Fanty et al. for the benefit of avoiding sending unnecessary audio data that includes only silence over the network, using teaching of the time interval as taught by Gopalan for the benefit of detecting the presenting of the near end speech (Gopalan et al. col. 5 lines 34-38 Around 0.6 seconds, the high band energy 316 jumps up and the difference between the low band energy 314 and the high band energy 316 decreases below the second threshold, indicating that near end speech s (t) is present.)

 	With respect to Claim 16, Secker-Walker et al. in view of Koya and Fanty et al. teach all the limitations of claim 14 upon which Claim 16 depends. Secker-Walker et al. in view of Koya and Fanty et al. fail to explicitly teach
 	wherein the 20time interval is 0.6 sec. 

5 	wherein the time interval is 0.6 sec (Gopalan et al. col. 5 lines 34-38 Around 0.6 seconds, the high band energy 316 jumps up and the difference between the low band energy 314 and the high band energy 316 decreases below the second threshold, indicating that near end speech s (t) is present.)
 	Secker-Walker et al., Koya, Fanty et al. and Gopalan et al. are analogous art because they are from a similar field of endeavor in the Signal Recognition techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of determining some “failsafe” criteria at local device if the server does not tell the local device to close the connection as taught by Secker-Walker et al., using teaching of detecting the end-of-utterance at the local device as taught by Koya for the benefit of ending the transmission of data to the speech recognition server, using teaching of halting the audio recording as taught by Fanty et al. for the benefit of avoiding sending unnecessary audio data that includes only silence over the network, using teaching of the time interval as taught by Gopalan for the benefit of detecting the presenting of the near end speech (Gopalan et al. col. 5 lines 34-38 Around 0.6 seconds, the high band energy 316 jumps up and the difference between the low band energy 314 and the high band energy 316 decreases below the second threshold, indicating that near end speech s (t) is present.)

13.	Claims 8, 17 are rejected under 35 U.S.C.103 as being unpatentable over Secker-Walker et al. (US 9818407 B1) in view of Koya (US 2016/0125883 A), Fanty et al. (US 2013/0132089 A1) and Albinson et al. (US 8, 880, 043 B1).  

 	With respect to Claim 8, Secker-Walker et al. in view of Koya and Fanty et al. teach all the limitations of claim 2 upon which Claim 8 depends. Secker-Walker et al. in view of Koya and Fanty et al. fail to explicitly teach
 	stops recording the voice after recording for more than 10 sec. 
	However, Albinson et al. teach 
 	stops recording the voice after recording for more than 10 sec (Albinson et al. col. 6 line 67 col. 7 lines 1-2 the recording of the audio automatically terminates once the recording has reached a maximum duration of time (e.g., one minute).)
the recording of the audio automatically terminates once the recording has reached a maximum duration of time (e.g., one minute).)

 	With respect to Claim 17, Secker-Walker et al. in view of Koya and Fanty et al. teach all the limitations of claim 11 upon which Claim 17 depends. Secker-Walker et al. in view of Koya and Fanty et al. fail to explicitly teach
 	23 wherein the near-end processing module stops recording the voice after the recording time exceeds 10 sec.
	However, Albinson et al. teach
 	wherein the near-end processing module stops recording the voice after the recording time exceeds 10 sec (Albinson et al. col. 6 line 67 col. 7 lines 1-2 the recording of the audio automatically terminates once the recording has reached a maximum duration of time (e.g., one minute).)
 	Secker-Walker et al., Koya, Fanty et al. and Albinson et al. are analogous art because they are from a similar field of endeavor in the Signal Recognition techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of determining some “failsafe” criteria at local device if the server does not tell the local device to close the connection as taught by Secker-Walker et al., using teaching of detecting the end-of-utterance at the local device as taught by Koya for the benefit of ending the transmission of data to the speech recognition the recording of the audio automatically terminates once the recording has reached a maximum duration of time (e.g., one minute).)

Conclusion
14.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to THUYKHANH LE whose telephone number is (571)272-6429.  The examiner can normally be reached on Mon-Fri: 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, RICHEMOND DORVIL can be reached on 571-272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.