Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. KR10-2019-0120567, filed on 09/30/2019.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 08/26/2020 and 01/26/2021 are being considered by the examiner.
Drawings
The drawing submitted on 08/26/2020 is been accepted by the examiner.
Response to Amendment
Claims 1-3, 5-13, and 15-20 are currently pending in the application and among them claims 1-2, 5-11, 15, and 20 are amended and claims 4 and 14 has been cancelled.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 11, and 20 have been considered but are moot because: 1) the amended limitation the applicants argue was not rejected before; 2) The applicant’s main argument (page 9 of the remark) where a limitation along with other limitation of claims 4 and 14, has been claimed to the amended with “wherein the first pre-processing to obtain the speech audio signal is performed prior to the second pre-processing to obtain the non-speech audio signal.”, is not found to be included in the amended claims 1, 11, and 20 and further the support of the limitation is not found neither in the specification nor in the Fig.9, as applicant claims. 

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-9, 11-18 and 20, are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Salvador (US 9087520 B1).

Regarding Claims 1, 11, and 20, Salvador teach:  An electronic device (Fig.4, audio-controlled device 106), comprising: a speaker (Fig.1, speaker 110) ); a plurality of microphones (Fig.4, one or more microphones 108) ; at least one processor (Fig.1,  processor 112) operatively connected with the speaker and the plurality of microphones; and a memory (Fig.1, memory 114) operatively connected with the at least one processor, wherein the memory is configured to store instructions which, when executed, cause the electronic device to (See Fig.1 or Fig.4 and Col 8, lines 58-61, Several modules such as instruction, datastores, and so forth may be stored within the memory 114 and configured to execute on the processor 112 of audio-controlled device 106): perform first pre-processing (filter out the audio both output by a speaker of the device and captured by the microphone of the device) on audio signals received via the plurality of microphones to obtain a speech audio signal(Col 2, lines 15-58, To illustrate, envision that an audio-controlled device is outputting a song on one or more speakers of the device. While outputting the audio, envision that a user wishes to provide a voice command to the device. Col 3, lines 50-65, The microphone 108 of the audio-controlled device 106 detects audio from the environment 102, such as sounds uttered from the user 104, and generates a corresponding audio signal. As illustrated, the audio-controlled device 106 includes a processor 112 and memory 114, which stores or otherwise has access to an audio-recognition engine 116. As used herein, a processor may include multiple processors and/or a processor having multiple cores. The audio-recognition engine 116 performs audio recognition on signals generated by the microphone based on sound within the environment 102, such as utterances spoken by the user 104. For instance, the engine 116 may identify both speech (i.e., voice commands) of the user and non-speech commands (e.g., a user clapping, tapping a table, etc.). The audio-controlled device 106 may perform certain actions in response to recognizing this audio, such as speech from the user 104. Col 5, lines 16-30, For instance, in response to the audio-recognition engine 116 identifying a predefined non-speech command issued by the user 104, the audio-modification engine 130 may somehow modify the output of the audio to increase the accuracy of speech recognition performed on an audio signal generated from sound captured by the microphone 108. As described above, the audio-modification engine 130 may attenuate the audio, pause the audio, switch output of the audio from stereo to mono, attenuate a particular frequency range of the audio, turn off one or more speakers outputting the audio or may alter the output of the audio in any other way. Col. 6, line 51 to Col 7, line 13 In some instances, the device 106 utilizes acoustic echo cancelation (AEC) techniques to filter out the audio both output by a speaker of the device and captured by the microphone of the device.), perform second pre-processing (determine short pulse having large amplitude and high frequency) on the audio signals received via the plurality of microphones to obtain a non-speech(Clapping or Clapping three times, or whistling with an increased or decreased frequency over time, or tapping sound and then subsequently clapping etc.) audio signal (Col 5, lines 48-55, In each of these instances, the microphone 108 captures sound that includes the non-speech command and generates a corresponding audio signal. The audio-recognition engine 116 then analyzes this audio signal to determine whether the audio signal includes a predefined non-speech command. In the example of a clapping sound, the engine 116 may determine whether the audio signal includes a relatively short pulse having a large amplitude and high frequency. ), upon obtaining the non-speech audio signal based on the second pre-processing on the audio signals, identify a non-speech audio signal pattern (a predefined pattern, i.e. clapping three times in a row) corresponding to the non-speech audio signal, obtain a non-speech audio signal-based first command based on the identified non-speech audio signal pattern, and perform at least one action corresponding to the obtained non-speech audio signal-based first command (alter the audio in response to a user clapping three times in a row), and wherein the first preprocessing to obtain speech audio signal is different (noise audio attenuation, echo cancellation etc.) from the second pre-possessing (short pulse with high amplitude and frequency determination for non-speech audio) to obtain the non-speech audio signal (Col 2, lines 15-58, To illustrate, envision that an audio-controlled device is outputting a song on one or more speakers of the device. While outputting the audio, envision that a user wishes to provide a voice command to the device. As such, as described below, the user may first issue a non-speech command in order to instruct the device to alter the output of the audio in order to increase the efficacy of speech recognition performed on subsequent voice commands issued by the user. For instance, in one example the user may clap his or her hands together and, in response to identifying this non-speech command, the device may attenuate or lower the volume of the song being output. In addition, the device may be configured to alter the audio in response to identifying a predefined number of non-speech commands and/or a predefined pattern. For instance, the device may be configured to alter the audio in response to a user clapping three times in a row, issuing a tapping sound and then subsequently clapping, whistling with an increased or decreased frequency over time, or the like.).

Regarding Claims 2 and 12, Salvador teach: The electronic device of claim 1, wherein the instructions are further configured to cause the electronic device : upon obtaining a speech audio signal based on the speech audio processing, perform speech recognition on the speech audio signal, and perform at least one action corresponding to a speech recognition-based second command(See rejection of claim 1, Col 2, lines 15-58 As such, as described below, the user may first issue a non-speech command in order to instruct the device to alter the output of the audio in order to increase the efficacy of speech recognition performed on subsequent voice commands issued by the user. Col 3, lines 9-17, Thereafter, the user may speak a predefined utterance (e.g., "wake up") that is recognized by the device. The user may thereafter issue additional voice commands (e.g., "please play the next song"), which may be recognized by the remote computing resources. The remote computing resources may then cause performance of the action, such as instructing the voice-controlled device to play a subsequent song, as requested by the user.).

Regarding Claims 3 and 13, Salvador teach: The electronic device of claim 1, wherein the non-speech audio signal comprises a signal obtained by using the plurality of microphones based on a physical input to at least part of the plurality of microphones or at least part of an area where the plurality of microphones are arranged (See rejection of claim 1 and also see Col 2, lines 44-51;and claims17-18: While the above example describes a user clapping, it is to be appreciated that the device may be configured to alter the audio in response to any other additional or alternative non-speech commands. For instance, the device may alter the audio in response to the user whistling, striking an object in the environment (e.g., tapping on a wall or table), stomping his or her feet, snapping his or her fingers, and/or some combination thereof. In addition, the device may be configured to alter the audio in response to identifying a predefined number of non-speech commands and/or a predefined pattern. For instance, the device may be configured to alter the audio in response to a user clapping three times in a row, issuing a tapping sound and then subsequently clapping, whistling with an increased or decreased frequency over time, or the like.).


Regarding Claims 5 and 15, Salvador teach: The electronic device of claim 1, wherein the second pre-processing comprises at least one of noise removal processing or echo removal processing on the received audio signals (See rejection of claim 1 and also see Fig.2 and Col 1, lines 59-67, While outputting the audio, a microphone of the device may capture sound within the environment and may generate an audio signal based on the captured sound. The device may then analyze the audio signal to identify a predefined non-speech command issued by a user within the environment. In response to identifying the predefined non-speech command, the device may somehow alter the output of the audio for the purpose of reducing the amount of noise within subsequently captured sound. Col. 6, line 51 to Col 7, line 13 In some instances, the device 106 utilizes acoustic echo cancelation (AEC) techniques to filter out the audio both output by a speaker of the device and captured by the microphone of the device.).

Regarding Claims 6 and 16, Salvador teach: The electronic device of claim 1, wherein the memory is further configured to store a plurality of non-speech audio signal pattern models, and wherein the instructions are further configured to cause the electronic device to identify a non-speech audio signal pattern corresponding to the non-speech audio signal based on the stored plurality of non-speech audio signal pattern models (See rejection of claim 1 and Col 2, lines 44-51, For instance, the device may alter the audio in response to the user whistling, striking an object in the environment (e.g., tapping on a wall or table), stomping his or her feet, snapping his or her fingers, and/or some combination thereof. In addition, the device may be configured to alter the audio in response to identifying a predefined number of non-speech commands and/or a predefined pattern. Col 5, lines 50-63, The audio-recognition engine 116 then analyzes this audio signal to determine whether the audio signal includes a predefined non-speech command. In the example of a clapping sound, the engine 116 may determine whether the audio signal includes a relatively short pulse having a large amplitude and high frequency. In some instances, the engine 116 utilizes a trained classifier that classifies a received audio signal as either including the predefined non-speech command or not. Alternatively, the engine 116 may utilize a Hidden Markov Model (HMM) having multiple, trained states to identify the predefined non-speech command. Other techniques, such as statistical models, a matched filter, a neural network classifier, or a support vector machine, may be used as well.).

Regarding Claims 7 and 17, Salvador teach: The electronic device of claim 6, wherein the memory is further configured to store a plurality of commands individually corresponding to the stored plurality of non-speech audio signal pattern models, and wherein the instructions are further configured to cause the electronic device to obtain the non-speech audio signal-based first command corresponding to the identified non-speech audio signal pattern based on the plurality of commands individually corresponding to the stored plurality of non-speech audio signal pattern models (See rejection of claim 6).

Regarding Claims 8 and 18, Salvador teach: The electronic device of claim 1, wherein the instructions are further configured to cause the electronic device to: associate a command (alter the audio or predefined number of non-speech commands or certain action) input by a user with the non-speech audio signal pattern (in response to the user whistling, striking an object in the environment (e.g., tapping on a wall or table), stomping his or her feet, snapping his or her fingers, and/or some combination thereof and/or a predefined pattern) obtained from the audio signals received from the plurality of microphones by a physical input to at least part of the plurality of microphones or at least part of an area where the plurality of microphones are arranged, and store (memory stores the command associated with non-speech audio) the command associated with the non-speech audio signal pattern (See rejection of claim 6 and Col 3, lines 60-63,  The audio-recognition engine 116 performs audio recognition on signals generated by the microphone based on sound within the environment 102, such as utterances spoken by the user 104. For instance, the engine 116 may identify both speech (i.e., voice commands) of the user and non-speech commands (e.g., a user clapping, tapping a table, etc.).).

Regarding Claim 9, Salvador teach: The electronic device of claim 8, further comprising: a transceiver, wherein the instructions are further configured to cause the electronic device to transmit the obtained non-speech audio signal pattern to an external electronic device (remote computing resource 118 or server) via the transceiver (See rejection of claim 8 and Col 4, lines 7-48, For instance, the audio-controlled device 106 may couple to the remote computing resources 118 over a network 120. Common expressions associated for these remote computing resources 118 include "on-demand computing", "software as a service (SaaS)", "platform computing", "network-accessible platform", "cloud services", "data centers", and so forth. The servers 122(1)-(P) include a processor 124 and memory 126, which may store or otherwise have access to some or all of the components described with reference to the memory 114 of the audio-controlled device 106. In some instances, the memory 126 has access to and utilizes another audio-recognition engine for receiving audio signals from the device 106, recognizing audio (e.g., speech) and, potentially, causing performance of an action in response. In some examples, the audio-controlled device 106 may upload audio data to the remote computing resources 118 for processing, given that the remote computing resources 118 may have a computational capacity that far exceeds the computational capacity of the audio-controlled device 106. Therefore, the audio-controlled device 106 may utilize an audio-recognition engine at the remote computing resources 118 for performing relatively complex analysis on audio captured from the environment 102. In one example, the audio-recognition 116 performs relatively basic audio recognition, such as identifying non-speech commands for the purpose of altering audio output by the device and identifying a predefined voice command that, when recognized, causes the device 106 to provide the audio the remote computing resources 118. The remote computing resources 118 may then perform speech recognition on these received audio signals to identify voice commands from the user 104.).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 10 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Salvador in view of Meaney et al.(US 9484030 B1).
Regarding Claims 10 and 19, Salvador teach: The electronic device of claim 9, wherein the instructions are further configured to cause the electronic device to: associate a command with the obtained non-speech audio signal pattern and store the command associated with the obtained non-speech audio signal pattern (See rejection of claim 8).

Salvador however do not specifically teach:  associate a command with the obtained non-speech audio signal pattern according to a request from the external electronic device via the transceiver (claim 10) or storing the command associated with the non-speech audio signal pattern according to a request from the external electronic device (Claim 19).
Meaney et al. teach: associate a command with the obtained non-speech audio signal pattern according to a request from the external electronic device via the transceiver (claim 10) or storing the command associated with the non-speech audio signal pattern according to a request from the external electronic device(Claim 19) (Col 2, line 63, to Col 4, line 6,  The system 100 may, during a training phase, be trained (140) to detect a first sound and a second sound. This training may involve a user interacting with a companion device 102 (such as a smartphone, tablet, etc.) to allow the system 100 to recognize certain sounds that the user 10 desires the system 100 to learn. The system 100 may then create models or other data associated with each sound that may be stored, for example on device 110, server 120, etc. The model(s) may be referred to later during runtime to determine if the specific sounds are detected. The system 100 may detect audio during runtime using microphone(s) 104 of device 110 or other microphones, such as a microphone 104 that may be part of a microphone array 108, where the array is communicably connected (for example using network 199) with other components of system 100, for example server(s) 120, etc. If applicable, the audio data may then be sent to a device with a sound recognition module 280, for example sent from a microphone array 108 to a device 110 or server 120 if the array 108 lacks a sound recognition module 280.
The steps described in reference to FIG. 1A (and described below) may be performed by various components of the system 100, such as some combination of the server(s) 120, device(s) 110, companion device(s) 102, network 199, etc. For example, audio data may be sent from a device 110 to a server 120 for analysis by the server 120 to determine if the audio data corresponds to a first sound and/or second sound.
Col 14, line 59 to Col 17, line 61, Alternatively, an application may operate on a companion device 102 (where the companion device 102 is in communication with other components of the system 100, for example over network 199), where the application is operable to configure the system 100 to perform the operations of FIGS. 1A and 1B.
Alternatively, the system may combine the use of pre-stored audio models with specific user training, where the system starts with a model for a specific sound (e.g., opening a door) and uses the information received during the listen/train mode to alter the pre-stored model or adjust the system's treatment of that model to more specifically recognize the first sound selected by the user. The updated audio information may then be stored in the appropriate sound profile(s). The system may request a command be selected by a user, or a command may be selected by the system, such as in the example of pre-configured first-second sound pairs. The system may present a list of potential commands to a user, or may offer a user the ability to indicate a different command (for example using a spoken selection, search query, etc.). An example of a user interface to select the command is shown in FIG. 4F. As shown in FIG. 4F, a number of potential commands may be output (for example on a display, as part of an audio output, etc.). Sample commands may include sending an email, sending a text message, playing a sound, speaking a message, launching an application on a device, executing an action on another device (e.g., blinking a light), etc. In the example of FIG. 4F, the user selects “Speak Message.” The system may store an indication of the command and other associated information in a manner associated with the first-second sound pair, user or home profile/ID, etc.  Finally, following configuration of the audio-triggered command configuration, as illustrated in FIG. 4H, the system may list for the user the first-second sound pairing, the time threshold, and the command to be executed. The system may present the user with options to accept, edit, or cancel the command. Upon acceptance, the system may store data linking the first sound, second sound, time threshold, command information and user/home profile/ID, or other information that may be used to initiate the audio triggered command. Following acceptance of the conditions of the audio triggered command, the system may present the user with the option to label the audio triggered command, as illustrated in FIG. 4I. The system may then store the label along with data corresponding to the different sounds, threshold(s), etc. The system may make the command available to the user (for example by linking the command to a user profile), so the user may review/edit/delete programmed commands at a later time.).
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Salvador to include the teaching of Meaney et al. above in order for the system to learn and use user specific sound data to initiate command for system triggering.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art of record Ke et al.(CN 101297355 A), teach: (Abstract) The invention claims systems and methods for receiving speech and non-speech communications of natural language questions and/or commands, transcribing the speech and non-speech communications to textual messages, and executing the questions and/or commands. The invention applies context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users presenting questions or commands across multiple domains. The systems and methods create, store and use extensive personal profile information for each user, thereby improving the reliability of determining the context of the speech and non-speech communications and presenting the expected results for a particular question or command.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878. The examiner can normally be reached Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2656                                                                                                                                                                                                        s