DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
In response to the office action from 9/18/2020, the applicant has submitted an amendment, filed 12/9/2020, amending claims 1, 12, 19-20, while arguing to traverse the prior art rejections. Applicant’s arguments have been fully considered but are moot with respect to new grounds of rejections further in view of Sereshki (US 2019/0096398) mandated by the latest amendments.
Response to Arguments
In what follows applicant’s arguments and comments will be addressed in the order presented with each argument presented in a given ¶, to be followed by one or more ¶’s of respective examiner’s responses.
Following a broad overview of the latest amendments and the last office action on page 6, claim objections are discussed.
Due to the latest amendments the objection to claim 20 is withdrawn. As regards to claim 1 and 19, it is recited: “the gerund “stopping” in claim language is appropriate” “in claim drafting, each of the steps in a method or process claim may be introduced with gerunds, which are a form of a verb that ends in “ing”…”

Respectfully in Claim 12 which corresponds to the apparatus claim corresponding to the Claims 1 and 19, in the same limitation, it recites “stop playing the first audio” and not “stopping playing ….”, which makes it plausible that for the claims 1 and 19 somehow a typo had occurred.
Page 7 the second ¶ discusses the previous 112(b) rejections.
Due to the latest amendments the said rejections are withdrawn.
From the last ¶ on page 7 to the last ¶ on page 9 it is argued why the prior art of record fails to teach the latest amendments.
Since a new reference is used for those amendments, therefore the applicant is respectfully directed to the new office action for further details.
From the last ¶ on page 9 to the end of page 11, it is argued that dependent claims should be allowed based on their presumed allowed parent claims.


Claim Objections
Claims 1, 19 objected to because of the following informalities:  “stopping playing” appears to be misspelling of “stop playing”.  Appropriate correction is required.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 1-3, 6-7, 9-14, 17-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Teng et al. (US Patent 8,521,539), and further in view of Sereshki (US 2019/0096398).


Regarding claim 1, Teng et al. do teach a method (Title, Abstract teach a method of  using “ASR” “for voice destination entry” also called “POI” (“point of interest”) and receiving “searching” “destinations” guidance), 
comprising:

matching the external input sound message with a receiving message to obtain a matching result, wherein the receiving message is associated with the first audio message in content (the “N-best results” (the audio message) in the example in Col. 6 lines 42-47 comprises of 5 entries (receiving messages), “speak[ing] the correct POI result” (Col. 11 line 3) requires matching what is spoken (the speech recognized external input sound message)  with one of the “N-best results” (receiving messages); i.e., Col. 6 lines 5+: “hybrid speech recognition technique can increase POI recognition accuracy” (matching the “speaking” “correct POI” (the external input sound message with an entry in the “N-best result” (receiving message) corresponding to the “correct POI” (a matching result));
determining whether the matching result meets a threshold (Col. 6 lines 5+: “hybrid speech recognition technique can increase POI recognition accuracy from about 64% to about 94%” (a threshold for recognition (or matching) required to judge the “correct POI” recognition (matching)); and

Teng et al. do not specifically disclose determining that the sound message is a speech interruption instruction, and stopping playing the first audio message.
Sereshki does teach determining that the sound message is a speech interruption instruction, and stopping playing the first audio message (¶ 0020 sentence 1: “upon detecting a wake word” (an input message) to activate a “third party device’; ¶ 0021 sentence 1: “when” “device detects a wake word” “the networked microphone device may provide an acknowledgement” (i.e., an “acknowledgement tone” (¶ 0022 line 4)); ¶ 0025 lines 6+: “If the AEC already active and stable” “when an acknowledgement tone is outputted” (i.e. upon reception of a “wake word” (a speech interruption instruction)) “because the device is playing back other audio content” (if a first audio message is playing) “such as music” “then the AEC may effectively cancel” (it stops) “the 
It would have therefore been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to incorporate the “AEC” of Sereshki into the “ASR” device of Teng et al. would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Teng et al. to do “Remov[al]” of “audio content” which is “playing back” in response to reception of a “wake word” (e.g. “voice search for a particular point of interest” (Teng et al. Col. 10 lines 35-37)) “intended to improve the signal-to-noise ratio of a voice input” “so as to provide a less noisy signal to a voice assistant service” as disclosed in Sereshki ¶ 0023.

Regarding claim 2, Teng et al. do teach the method according to claim 1, wherein after receiving the external input sound message during playing the first audio message, the method further comprises:
converting the sound message into a first sequence of characters (Col. 6 lines 13+: “In step 105, a client device receives/records a user utterance” (the external input sound message) according to Col. 6 lines 19+: “In step 110, the server executes automated speech recognition” (is converted to as shown in Col. 6 lines 42-46 into two sets of characters the first being Chinese characters (first sequence of characters));


Regarding claim 3, Teng et al. do teach the method according to claim 2, wherein the second sequence of characters is a sequence of Chinese pinyin characters (to Col. 6 lines 57+: “speech recognition manager” “converts the result to a pinyin string” (the second sequence of characters are in pinyin as shown in Col. 6 lines 42-46)).

Regarding claim 6, Teng et al. do teach the method according to claim 1, wherein after receiving the external input sound message during playing the first audio message, the method further comprises:
performing voice validity detection on the external input sound message (Col. 6 lines 5+: “hybrid speech recognition technique” (the external input sound message is speech recognized) “can increase POI recognition accuracy from about 64% to about 94%”(with a validity detection step)).

Regarding claim 7, Teng et al. do teach the method according to claim 1, wherein the method further comprises:
generating a receiving message list according to the first audio message (Col. 6 lines 35+: “In step 110, the server returns a list of N best results” (the receiving message associated with the “N-best” (audio message) is a list e.g. see Col. 6 lines 42-46)).

Regarding claim 9, Teng et al. do teach the method according to claim 1, wherein determining whether the matching result meets the threshold comprises:
determining whether a proportion of sound units in the sound message hitting sound units in the receiving message is greater than the threshold (Col. 11 lines 14+: “speech recognition manager records the utterance spoken in Chinese Mandarin that at least partially” (a proportion of sound units in the sound message) “identifies” (identified or hit those) “a point-of-interest location” (in the receiving message and is recognized with its recognition accuracy determined to be as required in Col. 6 line 7 as “about 94%” (the threshold)).

Regarding claim 10, Teng et al. do teach the method according to claim 1, wherein the external input sound message is a sound message received by an acoustic sensor (Col. 13 lines 8+: “Input devices” (for receiving the “user utterance” “point-of-

Regarding claim 11, Teng et al. do teach the method according to claim 1, wherein after determining that the matching result meets the threshold, stopping playing the first audio message, the method further comprises:
executing an instruction corresponding to the sound message (Col. 3 lines 25-27: “The geographical navigation system can then” (following matching of the “POI” (recognition exceeding the required threshold)) “continue with calculating a route” (an instruction is executed) “giving directions for travel”).

Regarding claim 12, Teng et al. do teach an apparatus (Title, Abstract teach a method of  using “ASR” “for voice destination entry” also called “POI” (“point of interest”) and receiving “searching” “destinations” guidance), 
comprising:
one or more processors; and memory communicatively coupled to the one or more processors, the memory storing computer-executable modules executable by the one or more processors (Col. 13 lines 31+: “processor 142 accesses memory system 141 via the use of interconnect 143 in order to launch, run, execute, interpret or otherwise perform the logic instructions of the speech recognition manager 140-1”),

a receiving module, configured to receive an external input sound message during playing a first audio message (Col. 6 lines 16-17: “user utters” “Dong Fang Ming Zhu”; Col. 6 line 35: “the server returns a list of N-best results” (playing a first audio message); Col. 11 lines 2-3: “a user views or hears the list of results” (while playing the said audio message) “and then speaks” (receiving an external sound message) “the correct POI result”; all the steps here and the following steps are carried out by according to Col. 3 line 26 a “geographical navigation system” (receiving module, matching module and interrupting module));
a matching module, configured to match the external input sound message with a receiving message and determine whether the matching result meets a threshold,  wherein the receiving message is associated with the first audio message in content (the “N-best results” (the audio message) in the example in Col. 6 lines 42-47 comprises of 5 entries (receiving messages), “speak[ing] the correct POI result” (Col. 11 line 3) requires matching what is spoken (the speech recognized external input sound message)  with one of the “N-best results” (receiving messages); i.e., Col. 6 lines 5+: “hybrid speech recognition technique can increase POI recognition accuracy” (matching the “speaking” “correct POI” (the external input sound message with an entry in the “N-best result” (receiving message) corresponding to the “correct POI” (a matching result)); Col. 6 lines 5+: “hybrid speech recognition technique can increase POI recognition accuracy from 
an interrupting module, configured to stop playing the first audio message upon determining that the matching result meets the threshold (Col. 3 lines 22-27: “the speech recognition manager receives input via the user interface (user confirmation) that selects a given point-of-interest from the presented list of N-best point-of-interest results. The geographical navigation system can then continue with calculating a route giving directions for travel” (the “interface” stops providing for a “user [to] view [or] hear the list of results” (Col. 11 line 2 (playing the first audio message)) and begins using the “interface” for providing the “route” to his selected “POI” and these follow recognition (the threshold matching) of the uttered “correct POI”).
Teng et al. do not specifically disclose determine that the sound message is a speech interruption instruction, and stop playing the first audio message.
Sereshki does teach determine that the sound message is a speech interruption instruction, and stop playing the first audio message (¶ 0020 sentence 1: “upon detecting a wake word” (an input message) to activate a “third party device’; ¶ 0021 sentence 1: “when” “device detects a wake word” “the networked microphone device may provide an acknowledgement” (i.e., an “acknowledgement tone” (¶ 0022 line 4)); ¶ 0025 lines 6+: “If the AEC already active and stable” “when an acknowledgement tone is outputted” (i.e. upon reception of a “wake word” (a speech interruption instruction)) 
It would have therefore been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to incorporate the “AEC” of Sereshki into the “ASR” device of Teng et al. would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Teng et al. to do “Remov[al]” of “audio content” which is “playing back” in response to reception of a “wake word” (e.g. “voice search for a particular point of interest” (Teng et al. Col. 10 lines 35-37)) “intended to improve the signal-to-noise ratio of a voice input” “so as to provide a less noisy signal to a voice assistant service” as disclosed in Sereshki ¶ 0023.

Regarding claim 13, Teng et al. do teach the apparatus according to claim 12, further comprising a conversion module, configured to convert the sound message into a first sequence of characters (Col. 6 lines 13+: “In step 105, a client device” (i.e., the “geographical navigation system” (conversion module)) “receives/records a user utterance” (the external input sound message) according to Col. 6 lines 19+: “In step 110, the server executes automated speech recognition” (is converted to as shown in 
wherein the receiving message includes a second sequence of characters corresponding to at least one option in the first audio message (the “N-best list” shown in Col. 6 lines 42-46 according to Col. 11 line 2 is provided to the user so he “hears the list of results” (receiving messages), and that list as shown in each row has Chinese characters next to their Pinyin as shown, because according to Col. 6 lines 57+: “speech recognition manager” “converts the result to a Pinyin string” (a second sequence of characters in pinyin as an option for the second sequence)).

Regarding claim 14, Teng et al. do teach the apparatus according to claim 13, wherein the second sequence of characters is a sequence of Chinese pinyin characters (to Col. 6 lines 57+: “speech recognition manager” “converts the result to a pinyin string” (the second sequence of characters are in pinyin as shown in Col. 6 lines 42-46)).

Regarding claim 17, Teng et al. do teach the apparatus according to claim 12, further comprising a valid speech determining module, configured to perform voice validity detection on the external input sound message (Col. 6 lines 5+: “hybrid speech recognition technique” (the external input sound message is speech recognized) “can increase POI recognition accuracy from about 64% to about 94%”(with a validity 

Regarding claim 18, Teng et al. do teach the apparatus according to claim 12, further comprising a generating module, configured to generate a receiving message list according to the first audio message (Col. 6 lines 35+: “In step 110, the server returns a list of N best results” (the receiving message associated with the “N-best” (audio message) is a list e.g. see Col. 6 lines 42-46); these are all carried out by the “geographical navigation system” (a generating module)).

Regarding claim 19, Teng et al. do teach one or more computer-readable media, stored thereon instructions that, when executed by one or more processors, cause the one or more processors  to perform acts (Col. 3 lines 30+: “One such embodiment comprises a computer program product that has a computer-storage medium (e.g., a non-transitory, tangible, computer-readable media, disparately located or commonly located storage media, computer storage media or medium, etc.) including computer program logic encoded thereon that, when performed in a computerized device” (a terminal device) “having a processor” (one or more processors) “and corresponding memory, programs the processor to perform (or causes the processor to perform) the operations disclosed herein” (perform acts i.e.,  Title, Abstract teach a method of  using 
including:
receiving an external input sound message during playing a first audio message (Col. 6 lines 16-17: “user utters” “Dong Fang Ming Zhu”; Col. 6 line 35: “the server returns a list of N-best results” (playing a first audio message); Col. 11 lines 2-3: “a user views or hears the list of results” (while playing the said audio message) “and then speaks” (receiving an external sound message) “the correct POI result”);
matching the external input sound message with a receiving message to obtain a matching result, wherein the receiving message is associated with the first audio message in content (the “N-best results” (the audio message) in the example in Col. 6 lines 42-47 comprises of 5 entries (receiving messages), “speak[ing] the correct POI result” (Col. 11 line 3) requires matching what is spoken (the speech recognized external input sound message)  with one of the “N-best results” (receiving messages); i.e., Col. 6 lines 5+: “hybrid speech recognition technique can increase POI recognition accuracy” (matching the “speaking” “correct POI” (the external input sound message with an entry in the “N-best result” (receiving message) corresponding to the “correct POI” (a matching result));
determining whether the matching result meets a threshold (Col. 6 lines 5+: “hybrid speech recognition technique can increase POI recognition accuracy from about 
upon determining that the matching result meets the threshold, stopping playing the first audio message (Col. 3 lines 22-27: “the speech recognition manager receives input via the user interface (user confirmation) that selects a given point-of-interest from the presented list of N-best point-of-interest results. The geographical navigation system can then continue with calculating a route giving directions for travel” (the “interface” stops providing for a “user [to] view [or] hear the list of results” (Col. 11 line 2 (playing the first audio message)) and begins using the “interface” for providing the “route” to his selected “POI” and these follow recognition (the threshold matching) of the uttered “correct POI”).
Teng et al. do not specifically disclose determining that the sound message is a speech interruption instruction, and stopping playing the first audio message.
Sereshki does teach determining that the sound message is a speech interruption instruction, and stopping playing the first audio message (¶ 0020 sentence 1: “upon detecting a wake word” (an input message) to activate a “third party device’; ¶ 0021 sentence 1: “when” “device detects a wake word” “the networked microphone device may provide an acknowledgement” (i.e., an “acknowledgement tone” (¶ 0022 line 4)); ¶ 0025 lines 6+: “If the AEC already active and stable” “when an acknowledgement tone is outputted” (i.e. upon reception of a “wake word” (a speech interruption instruction)) 
It would have therefore been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to incorporate the “AEC” of Sereshki into the “ASR” device of Teng et al. would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Teng et al. to do “Remov[al]” of “audio content” which is “playing back” in response to reception of a “wake word” (e.g. “voice search for a particular point of interest” (Teng et al. Col. 10 lines 35-37)) “intended to improve the signal-to-noise ratio of a voice input” “so as to provide a less noisy signal to a voice assistant service” as disclosed in Sereshki ¶ 0023.

Regarding claim 20, Teng et al. do teach the one or more computer-readable media of claim 19, wherein after receiving the external input sound message during playing the first audio message, the acts further comprise:
converting the sound message into a first sequence of characters (Col. 6 lines 13+: “In step 105, a client device receives/records a user utterance” (the external input sound message) according to Col. 6 lines 19+: “In step 110, the server executes 
wherein the receiving message includes a second sequence of characters corresponding to at least one option in the first audio message (the “N-best list” shown in Col. 6 lines 42-46 according to Col. 11 line 2 is provided to the user so he “hears the list of results” (receiving messages), and that list as shown in each row has Chinese characters next to their Pinyin as shown, because according to Col. 6 lines 57+: “speech recognition manager” “converts the result to a Pinyin string” (a second sequence of characters in pinyin as an option for the second sequence)).


Claims 4, 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Teng et al. in view of Sereshki, and further in view of Horvitz (US 2007/0136222).
Regarding claim 4, Teng et al. in view of Sereshki do not specifically disclose the method according to claim 2, wherein the second sequence of characters is a sequence of other language characters other than Chinese Pinyin.
Horvitz do teach the method according to claim 2, wherein the second sequence of characters is a sequence of other language characters other than Chinese Pinyin (¶ 0031 lines 8+: “the name of the coordinate sector” (a receiving message) “subsector, etc., can be presented” (“display[ed]” as characters (¶ 0033 line 6)) “to a recipient in a 
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the functionality of the “GPS” of the “QUESTION AND ANSWER ARCHITECTURE” of Horvitz into the “car navigation system” of Teng et al. (Col. 10 lines 7-8) in Teng et al. in view of Sereshki, would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Teng et al. in view of Sereshki to “present” a “recipient” “name of [a] coordinate sector” (i.e., the “POI” in Teng et al.) in “English” as well as “Chinese” when the user is in a Chinese speaking territory as disclosed in Horvitz ¶ 0031 lines 8-10.

Regarding claim 15, Teng et al. in view of Sereshki do not specifically disclose the apparatus according to claim 13, wherein the second sequence of characters is a sequence of other language characters other than Chinese Pinyin.
Horvitz do teach the apparatus according to claim 13, wherein the second sequence of characters is a sequence of other language characters other than Chinese Pinyin (¶ 0031 lines 8+: “the name of the coordinate sector” (a receiving message) “subsector, etc., can be presented” (“display[ed]” as characters (¶ 0033 line 6)) “to a 
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the functionality of the “GPS” of the “QUESTION AND ANSWER ARCHITECTURE” of Horvitz into the “car navigation system” of Teng et al. (Col. 10 lines 7-8) in Teng et al. in view of Sereshki, would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Teng et al. in view of Sereshki to “present” a “recipient” “name of [a] coordinate sector” (i.e., the “POI” in Teng et al.) in “English” as well as “Chinese” when the user is in a Chinese speaking territory as disclosed in Horvitz ¶ 0031 lines 8-10.

Claims 5, 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Teng et al. in view of Sereshki, and further in view of JUNKAR et al. (US 2014/0269193).
Regarding claim 5, Teng et al. in view of Sereshki do not specifically disclose the method according to claim 1, wherein after receiving the external input sound message during playing the first audio message, the method further comprises performing denoising processing on the external input sound message.
JUNKAR et al. do teach:

It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the functions of the “GPS” of JUNKAR et al. with its “echo” “processing” into the “car navigation system” of Teng et al. (Col. 10 lines 7-8) in Teng et al. in view of Sereshki, would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Teng et al. in view of Sereshki filter by its “car navigation system” any “echo” should they arise in a vehicle to better resolve spoken instructions (e.g. “sound messages”) inputted by users.

Regarding claim 16, Teng et al. in view of Sereshki do not specifically disclose the apparatus according to claim 12, further comprising a denosing module, configured to perform denoising processing on the external input sound message.
JUNKAR et al. do teach:

It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the functions of the “GPS” of JUNKAR et al. with its “echo” “processing” into the “car navigation system” of Teng et al. (Col. 10 lines 7-8) in Teng et al. in view of Sereshki, would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Teng et al. in view of Sereshki filter by its “car navigation system” any “echo” should they arise in a vehicle to better resolve spoken instructions (e.g. “sound messages”) inputted by users.


Claim 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Teng et al. in view of Sereshki, and further in view of Henry (US Patent 10,109,275).

Henry does teach the method according to claim 7, wherein the receiving message list is an inverted index list (Col. 8 lines 37+: “In some implementations, search component 450 may implement an inverted index” (apply an inverted index listing to) “to speed up retrieval” (a receiving message attributed to a search) “of words”).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate “implement[ing] an inverted index” to “retriev[ed]” results of Henry to the “N-best” lists of Teng et al. in Teng et al. in view of Sereshki would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Teng et al. in view of Sereshki to “implement an inverted index” to its “N-best” lists so as to “speed up retrieval” as disclosed in Col. 8 lines 37-38.

Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARZAD KAZEMINEZHAD whose telephone number is (571)270-5860.  The examiner can normally be reached on 10:30 am to 11:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, DANIEL C WASHBURN can be reached on (571)272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/Farzad Kazeminezhad/
Art Unit 2657
February 16th 2021.