DETAILED ACTION
1.	This communication is in response to the Amendments filed on 10/27/2021. Claims 1-20 are pending and have been examined. 
Response to Amendments and Arguments
2.	Applicant's arguments with respect to claim rejections under 35 U.S.C. 103 have been fully considered, but they are not persuasive. In particular, the applicant argues that the references do not teach “receive an input for activating an intelligent agent service from a user while at least one application is executed ..” In response, the examiner respectfully disagrees.
Note that KIM teaches: [Abstract] “a combined lip reading and voice recognition multimodal interface system, which can issue a navigation operation instruction only by voice and lip movements” where “navigation” clearly reads on an “application” (such as GPS) and “issue a navigation operation instruction” clearly reads on “receive an input for activating an intelligent agent service from a user” where “activating” can be broadly interpreted (such as to perform a particular navigation action based on the received command).
Claim Rejections - 35 USC § 103
3.	Claims 1-7, 10-17, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Kim, et al. (US 20110071830; hereinafter KIM) in view of Takayanagi, et al. (US 20170309275; hereinafter TAKAYANAGI).
As per claim 1, KIM (Title: Combined lip reading and voice recognition multimodal interface system) discloses “An electronic device comprising: a camera; a microphone; a display; a memory; and a processor (KIM, [0009], camera; [0049], microphone; [0010], application service screen; [0037], memory; [0036], processor) configured to:    
 receive an input for activating an intelligent agent service from a user while at least one application is executed (KIM, [Abstract], a combined lip reading and voice recognition multimodal interface system, which can issue a navigation operation instruction <read on activating an intelligent agent service/application> only by voice and lip movements), 
[ identify context information ] of the electronic device (KIM, [0009], receive an instruction in an environment where a voice recognizer does not work due to noise <read on context>), 
control to acquire image information of the user through the camera [ based on the identified context information ] including information on the executed at least one application, detect movement of a user's lips included in the acquired image information to recognize a speech of the user (KIM, [0009], a lip reading system that effectively detects lips from a face image through a camera, suitably tracks lip movements, and suitably recognizes a voice instruction based on feature values of the lips, and then suitably combines the lip reading system with an audio-based voice recognition system such that lip reading using a camera image can suitably receive an instruction in an environment where a voice recognizer does not work due to noise <read on ‘information on the executed at least one application’>), and 
perform a function of the executed at least one application corresponding to the recognized speech (KIM, [Abstract], issue a navigation operation instruction <read on performing a function> only by voice and lip movements).”
KIM does not explicitly disclose “identify context information .. based on the identified context information ..” However, this feature is taught by TAKAYANAGI (Title: Method and apparatus for recognizing speech by lip reading). 
In the same field of endeavor, TAKAYANAGI teaches: [0119] “The signal to noise data and signal to brightness data can be obtained by the audio input device 314 and the video input device 316 together with the controller 306” and [0126] “one rule associated with the variable text conversion value can be a signal to noise ratio between the audio signal and a background noise is below a predetermined threshold. In this case, the controller can be configured to disable the audio input device.”
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of TAKAYANAGI in the system taught by KIM to determine the signal-to-noise ratio to be used as context information for speech recognition mode selection.
As per Claim 2 (dependent on claim 1), KIM in view of TAKAYANAGI further discloses “acquire noise around the electronic device through the microphone; and store information on the acquired noise around the electronic device as the context information (TAKAYANAGI, [0119], The signal to noise data and signal to brightness data can be obtained by the audio input device 314 and the video input device 316 together with the controller 306; KIM, [0049], microphone; [0037], memory <read on a ready mechanism to store any information>).” 
As per Claim 3 (dependent on claim 2), KIM in view of TAKAYANAGI further discloses “activate the camera based on the noise around the electronic device being higher than or equal to a preset value (TAKAYANAGI, [0126], one rule associated with .”  
As per Claim 4 (dependent on claim 1), KIM in view of TAKAYANAGI further discloses “store at least one of a type or an execution state of the at least one application being executed as the context information; and activate the camera based on the at least one application being executed reproducing music or a video (KIM, [Abstract], issue a navigation operation instruction .. allowing a driver to look ahead during a navigation operation <read on the execution state of the navigation application as context>; [0037], memory <read on information storage>; [0010], an application service screen of a navigation system as an interactive system based on a suitable scenario; [0074], a locating screen, a routing screen, an actual road guide screen, etc. <read on reproducing video>; [0009], a lip reading system that effectively detects lips from a face image through a camera <read on the associated activation under any condition per system design choice>).”    
Claim 5 (dependent on claim 1), KIM in view of TAKAYANAGI further discloses “control to receive a voice signal through the microphone; recognize a combination speech corresponding to a combination of the voice signal and the movement of the user’s lips; and perform a function corresponding to the recognized combination speech (KIM, [Abstract], a combined lip reading and voice recognition multimodal interface system, which can issue a navigation operation instruction <read on performing a function> only by voice and lip movements).” 
As per Claim 6 (dependent on claim 5), KIM in view of TAKAYANAGI further discloses “based on brightness of the acquired image information being equal to or lower than a preset value: recognize a voice recognition-based speech corresponding to the voice signal except for the user’s movement of the lips; and perform a function corresponding to the recognized voice recognition-based speech (TAKAYANAGI, [0127], one rule associated with the variable text conversion value can be a signal to brightness ratio is below a predetermined threshold. In this case, the controller can be configured to disable the video input device; KIM, [Abstract], a combined lip reading and voice recognition multimodal interface system, which can issue a navigation operation instruction <read on performing a function>).”  
As per Claim 7 (dependent on claim 5), KIM in view of TAKAYANAGI further discloses “based on brightness of the acquired image information being lower than a preset value, display a user interface indicating failure of recognition of the movement of the user’s lips to the user through the display (TAKAYANAGI, [0127], one rule associated with the variable text conversion value can be a signal to brightness ratio is below a predetermined threshold. In this case, the controller can be configured to disable .”
As per Claim 10 (dependent on claim 1), KIM in view of TAKAYANAGI further discloses “based on the intelligent agent service being activated, provide at least one piece of information on the at least one application (KIM, [Abstract], issue a navigation operation instruction only by voice and lip movements, thus allowing a driver to look ahead during a navigation operation <read on an intelligent agent service being activated> and reducing vehicle accidents related to navigation operations <read on the associated application information such as route guidance> during driving).”
Claims 11-17, 20 (similar in scope to claims 1-7, 10) are rejected under the same rationale as applied above for claims 1-7, 10. 
4.	Claims 8-9, 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over KIM in view of TAKAYANAGI, and further in view of Cutler, et al. (US 20040267521; hereinafter CUTLER).
As per Claim 8 (dependent on claim 5), KIM in view of TAKAYANAGI further discloses “based on a plurality of lips being detected based on the acquired image information: [identify the voice signal ] and movements of the plurality of lips; and [ display a user interface for distinguishing lips corresponding to the user from other lips through the display ] (KIM, [0009], a lip reading system that effectively detects lips from a face image through a camera, .”
KIM in view of TAKAYANAGI does not explicitly disclose “identify the voice signal and movements of the plurality of lips .. display a user interface for distinguishing lips corresponding to the user from other lips through the display ..” However, this feature is taught by CUTLER (Title: System and method for audio/video speaker detection). 
In the same field of endeavor, CUTLER teaches: [Abstract] “The audio and video are inputted into a time-delay neural network that processes the data to determine which target is speaking. The neural network processing is based upon a correlation to detected mouth movement from the video data and audio sounds detected by the microphone,” [0015] “simultaneous speakers and background noise can be handled by first using a microphone array to beam form on each face detected and then evaluating the TDNN using the beam formed audio” and [0045] “A monitor 191 or other type of display device is also connected to the system bus 121 via an interface, such as a video interface 190.”
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of CUTLER in the system taught by KIM and TAKAYANAGI to correlate received voice to one particular speaker from a plurality of users.
As per Claim 9 (dependent on claim 8), KIM in view of TAKAYANAGI and CUTLER further discloses “configure a weighted value of a first speech recognized based on movement of the lips corresponding to the user to be higher than a weighted value of a second speech recognized based on the voice signal acquired through the microphone; and recognize the combination speech (KIM, [Abstract], a combined lip reading and voice recognition multimodal interface system. Examiner’s Note: The specification does not describe HOW the voice recognition result and the lip reading result are actually COMBINED, and as such it is unclear how is the WEIGHT applied to such combination speech. The applicant must clarify this for the examiner to examine the claim;                               ).”  
Claims 18-19 (similar in scope to claims 8-9) is rejected under the same rationale as applied above for claims 8-9. 
Conclusion 
5.	THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).   
	A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 		
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FENG-TZER TZENG whose telephone number is (571)272-
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir (SPE) can be reached on 571-272-7799. 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/FENG-TZER TZENG/		1/10/2022Primary Examiner, Art Unit 2659