DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on June 30, 2022 has been entered.
Claims 1, 11, and 15 have been amended.  Claims 8-9 have been cancelled.  Claims 1-7 and 10-15 remain pending.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on May 16, 2022 is being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-7 and 10-15 are rejected under 35 U.S.C. 103 as being unpatentable over Lockhart (US Patent No. 10,186,265) in view of Iwase (US Patent Application Publication No. 2020/0051546) and further in view of Hardek (US Patent Application Publication No. 2018/0033454).
Lockhart teaches a system and method for multi-layer keyword detection to avoid detection of keywords in output audio.  Regarding claims 1, 11 and 15, Lockhart teaches a speech recognition device, computer medium and method for operating the speech recognition device comprising: a microphone (103); and a processor (Fig 7) configured to receive a voice signal through the microphone (col. 6, lines 31-39 --local device 110 may be configured to receive and respond to wakewords and execute audible commands in conjunction with server 120. The local device 110 may include a first detector (primary wakeword module) 220a to detect a wakeword in audio data detected by the microphone 103. The local device 110 may also include a second detector (secondary wakeword module) 220b to detect a wakeword in output audio data to be output from a speaker 101 of the local device 110; col. 7, line 65 to col. 8, line 12; col 22, lines 1-2 – listen for and capture audio; col. 22, lines 15-22), generate voice characteristic data by analyzing the voice signal by using a data recognition model based on a neural network (col. 8, lines 55-60; col. 9, lines 17-51 – compares audio data to stored models…system built on neural networks; col. 21, lines 49-67; col. 23. Lines 44-54).  Lockhart fails to specifically teach a single embodiment to specifically implement the neural network data recognition model.  However, Lockhart teaches the system is capable of providing neural network data recognition in an additional embodiment (col. 9, lines 48-51) and one having ordinary skill in the art at the time of the invention would have been able to implement the additional embodiment neural network features provided by Lockhart, and the results would have been predictable and resulted in an improved human-computer interaction by reducing execution of inadvertent keyword/commands received by the system, as taught by Lockhart.  Lockhart teaches detecting voice uttered from a user and voice output from an external device (col. 6, lines 31-39 --local device 110 may be configured to receive and respond to wakewords and execute audible commands in conjunction with server 120. The local device 110 may include a first detector (primary wakeword module) 220a to detect a wakeword in audio data detected by the microphone 103. The local device 110 may also include a second detector (secondary wakeword module) 220b to detect a wakeword in output audio data to be output from a speaker 101 of the local device 110; col. 7, line 65 to col. 8, line 12; col. 9, lines 17-21 -- Once speech is detected in the audio received by the device 110 (or separately from speech detection), the device 110 may use the wakeword detection module 220 to perform wakeword detection to determine when a user intends to speak a command to the device 110; col. 22, lines 15-22).  Lockhart fails to specifically teach a camera, where the system is configured to obtain an image of a periphery of the speech recognition device by using the camera, and identify whether a user obtained in the periphery image corresponds to pre-set face recognition data if a registered user or an external device is included in the obtained peripheral image.  Iwase teaches an information processing and voice output device that processes data obtained from an image and audio/noise [para 0012]; a camera to  obtain the image data [para 0050 -- sensor device 53 includes a stereo camera and a depth sensor. An image captured by the sensor device 53 and depth information (distance information) of a subject in the image are supplied to the image recognition engine 54]; obtaining audio from external devices [para 0062 – noise sources; para 0067 – television, audio unit]; image recognition engine that detects face of user  and noise source [para 0063-0068 -- the image recognition engine 54 detects the face of the user 10 from the image from the sensor device 53 on the basis of the face of the user 10 registered in advance…. the image recognition engine 54 recognizes the above-described object in the direction indicated by the information from the noise detection processing part 52 in the image from the sensor device 53]; where the image recognition unit learns in advance an object (person, television, audio unit)] as a source and recognizes the learned object [para 0067]; and can provide voice parameters of audio [para 0091].    One having ordinary skill in the art at the time of the invention would have recognized the advantages of implementing a camera and the camera processing techniques suggested by Iwase, in the system of Lockhart, for the purpose of assisting in locating the audio or noise when the user emits an utterance, so as to ensure the audio being recognized is provided by the correct user and not provided by other persons in the area or audio output by other devices.
Lockhart fails to specifically teach identifying, based on the voice characteristic data, whether the voice signal is a user- spoken voice or a voice output from an external device.  In a similar field of endeavor, Hardek teaches a speech-driven system can distinguish speech obtained from users of the system from other speech spoken by background persons, as well as from background speech from public address systems, and teaches user speech or other utterances by the user are likely to have different acoustic signatures than background speech from other persons at some distance from the user, or and also different acoustic signatures from sounds from a PA system [para 0045-0048].  Hardek teaches the system is advantageous in  preventing extraneous sounds such as background speech or PA system speech from interfering with the desired operation of the voice driven systems [para 0012].  Therefore, one having ordinary skill in the art at the time of the invention would have recognized the advantages of implementing the distinguishing of speech obtained by users and background speech/device output audio, as suggested by Hardek, in the system of Lockhart, for the purpose of preventing extraneous sounds such as background speech or PA system speech from interfering with the desired operation of the voice driven systems, as suggested by Hardek, to thereby improve system performance and accuracy in responding to the user.
Lockhart teaches the voice signal determined as the voice uttered from the user, determine the voice signal as a voice command of the user and perform an operation corresponding to the voice command (col. 9, lines 17-21 -- Once speech is detected in the audio received by the device 110 (or separately from speech detection), the device 110 may use the wakeword detection module 220 to perform wakeword detection to determine when a user intends to speak a command to the device 110; col. 10, lines 3-18 -- ASR module 250 transcribes the audio data 111 into text data representing words of speech contained in the audio data 111. The text data may then be used by other components for various purposes, such as executing system commands). 
Lockhart teaches voice recognition data [col. 15, lines 18-19 – voice profiles] of a registered user authorized to issue a voice command, and when it is determined that the voice signal corresponds to the voice recognition data set in advance, determine the voice signal as a voice command of the user and perform an operation corresponding to the voice command (col. 9, lines 17-21 -- Once speech is detected in the audio received by the device 110 (or separately from speech detection), the device 110 may use the wakeword detection module 220 to perform wakeword detection to determine when a user intends to speak a command to the device 110; col. 10, lines 3-18 -- ASR module 250 transcribes the audio data 111 into text data representing words of speech contained in the audio data 111. The text data may then be used by other components for various purposes, such as executing system commands – where providing for user’s with a stored voice profile to utter voice commands that will be executed by the device is an obvious step requiring only routine skill in the art).
Regarding claims 2 and 12, the combination of Lockhart, Iwase and Hardek teaches the processor is further configured to filter the voice signal when the voice signal is determined as the voice output from the external device [Lockhart at col. 15, lines 48-54 – subtracting audio sent to speakers from audio captured by the microphone – where subtracting the audio provides a form of filtering the audio from the captured audio].   
Regarding claims 3 and 13, the combination of Lockhart, Iwase and Hardek teaches the processor is further configured to generate the voice characteristic data by analyzing whether the voice signal is a reconstructed signal from a compressed voice signal [Lockhart at col. 19, lines 24-51 –system detects formats of the audio data to determine different format and can convert the formats and provides audio processing to include compression, where a data received that is compressed that has to be converted to obtain the reconstructed signal is an obvious step requiring only routine skill in the art] .   
Regarding claims 4 and 14, the combination of Lockhart, Iwase and Hardek teaches the processor is further configured to generate the voice characteristic data by analyzing whether the voice signal includes electrical noise [Lockhart at col. 9, lines 7-13 – includes models corresponding to noise; col. 10, line 47].   
Regarding claim 5, the combination of Lockhart, Iwase and Hardek processor is further configured to generate the voice characteristic data by analyzing an output location of the voice signal [Lockhart at col. 15, lines 21-25 – processing and tracking of individual device location; col. 15, lines 48-51 – positioning relationships of output speakers and microphones; col. 25, lines 25-31 – acoustic localization].   
Regarding claim 6, the combination of Lockhart, Iwase and Hardek teaches the processor is further configured to set in advance at least one output location of the voice signal output from the external device, and generate the voice characteristic data by determining whether an output location of the voice signal corresponds to the output location set in advance  [Lockhart at col. 15, lines 21-25 – processing and tracking of individual device location; col. 15, lines 48-51 – positioning relationships of output speakers and microphones; col. 25, lines 25-31 – acoustic localization –where the system provides for processing of a device in a specific location and the specific location (user bedroom) provides for a set advanced location ].    
Regarding claim 7, the combination of Lockhart, Iwase and Hardek teaches the processor is further configured to generate the voice characteristic data by determining a number of output locations from which the voice signal is output [Lockhart at col. 15, lines 21-25 – processing and tracking of individual device location; col. 15, lines 48-51 – positioning relationships of output speakers and microphones; col. 25, lines 25-31 – acoustic localization – where acoustic localization processing provides for determining multiple output locations].   
Regarding claim 10, Lockhart teaches the processor is further configured to set in advance voice recognition data [col. 15, lines 18-19 – voice profiles] of a registered user authorized to issue a voice command, and when it is determined that the voice signal corresponds to the voice recognition data set in advance, determine the voice signal as a voice command of the user and perform an operation corresponding to the voice command (col. 9, lines 17-21 -- Once speech is detected in the audio received by the device 110 (or separately from speech detection), the device 110 may use the wakeword detection module 220 to perform wakeword detection to determine when a user intends to speak a command to the device 110; col. 10, lines 3-18 -- ASR module 250 transcribes the audio data 111 into text data representing words of speech contained in the audio data 111. The text data may then be used by other components for various purposes, such as executing system commands – where providing for user’s with a stored voice profile to utter voice commands that will be executed by the device is an obvious step requiring only routine skill in the art).


 Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANGELA A ARMSTRONG whose telephone number is (571)272-7598. The examiner can normally be reached M,T,TH,F 11:30-8:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

ANGELA A. ARMSTRONG
Primary Examiner
Art Unit 2659



/ANGELA A ARMSTRONG/Primary Examiner, Art Unit 2659