DETAILED ACTION
Introduction
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
2.	The information disclosure statement (IDS) submitted on 06/18/2020 and 03/26/2021 is/are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Response to Arguments/Amendments
3.	 With respect to Claim Rejection 35 U.S.C § 103. Applicant’s arguments have been considered but are moot because the new ground to rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenge in the argument. 
	With respect to IDS, the date of IDS(s) are corrected in the IDS section. 

Claim Rejections - 35 USC § 103
4.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

5.	Claims 1-3, 6 are rejected under 35 U.S.C.103 as being unpatentable over 
Cho et al. (US 2017/0068512 A1) in view of Lord et al. (US 2015/0336578 A1.)

 	With respect to Claim 1, Cho et al. disclose 
	A voice processing apparatus, comprising: 
 	a processor to execute a program (Cho et al. [0130] For example, the communicator 160 performs a function of transmitting a user captured image, voice data, conversation feature information generated by the processor 120, etc. to an external device or receiving a user captured image, voice data, etc. from the external device); and 
 	a memory to store the program which, when executed by the processor (Cho et al. 0142] The non-transitory computer readable medium may, for example, store data semi-permanently and is readable by devices. For example, the aforementioned applications or programs may be stored in the non-transitory computer readable media such as compact disks (CDs), digital video disks (DVDs), hard disks, Blu-ray disks, universal serial buses (USBs), memory cards, and read-only memory (ROM)), performs processes of, 
detecting an opening state of a mouth of a user (Cho et al. [0072] if the processor 120 extracts a mouth of a face of the user to recognize changes in a shape of the mouth, the processor 120 may be configured to determine that the user is making conversations, For example, the processor 120 may be configured to match the face information of the user with the voice information extracted from the voice of the user input along with an image, [0089] the processor 120 may be configured to recognize a fifth user and a sixth user from a captured image. If it is determined that face angles and face poses of the fifth and sixth users are face angles and face poses at which faces of the fifth and sixth users face each other, and mouth shapes respectively extracted from faces of the fifth and sixth users are changed, the processor 120 may be configured to determine that the fifth and sixth users make conversations); and 
acquiring voice information, wherein voice identification information for identifying voice of a specific user is previously registered (Cho et al. Fig. 3A elements 310-315, [0076] For example, if a plurality of users utter voices, the processor 120 may be configured to extract voice information from each of the voices respectively uttered by the plurality of users. The processor 120 may be configured to compare the information of each of the plurality of users with pre-stored information to determine whether the plurality of users are respectively family members), the voice processing apparatus further comprises: 
recognizing only voice emitted in a state where the user who is registered opens the mouth as a speaker voice based on the opening state which is detected, the voice information which is acquired, and the voice identification information (Cho et al. [0012] The processor may be configured to acquire respective images and voices of the family members to generate family member information, to register the family member information in the storage unit, and, in response to at least one of the captured image and the user voice being input, to compare the at least one of the captured image and the user voice with the family member information to determine whether the conversations are made between the family members, [0016] In response to a name of at least one of family members taking part in conversation, being detected form the user voice, the processor may be configured to determine that the conversations are made between the family members, [0060] the processor 120 may be configured to extract voice feature information (or voice information) from a voice of the user and to store the voice feature information in the storage unit 110, [0072] if the processor 120 extracts a mouth of a face of the user to recognize changes in a shape of the mouth, the processor 120 may be configured to determine that the user is making conversations, For example, the processor 120 may be configured to match the face information of the user with the voice information extracted from the voice of the user input along with an image, [0074] if at least one of a captured image and a user voice is input, the processor 120 may be configured to compare the at least one of the captured image and the user voice with family member information to determine whether conversations are make between family members. Cho et al. determines that the user is making conversation based on the mouth open of the user and next matches the face information of the user with the voice information of the user. It implies that the system recognizes only voice from the user who is opens the mouth.); and 
Cho et al. fail to explicitly teach 
	transmitting speaker voice information which is information of the speaker voice which is recognized to an external server, and 
 	the speaker voice information transmitted to the external server corresponds to only the voice emitted in the state where the registered user opens the mouth.  
	However, Lord et al. teach
 	transmitting speaker voice information which is information of the speaker voice which is recognized to an external server (Lord et al. [0072] the AAFS is configured to receive data that represents an utterance of a speaker and that is obtained at or about a hearing device associated with a user. The AAFS may then identify the speaker based at least in part on the received data, such as by performing speaker recognition and/or speech recognition with the received data. The AAFS may then determine speaker-related information associated with the identified speaker, such as an identifier (e.g., name or title) of the speaker, [0200] the process may, in substantially real time, provide the user with indication of a current speaker, [1420] At block 23.7902, the process performs transmitting the text along with an audio representation of the utterance and an identifier of the speaker to a remote conferencing device and/or a conference call system. After having performed the speech recognition, the mobile device may transmit the obtained text along with an identifier of the speaker and the audio representation of the speaker’s utterance to a remote system or device, [0515] remote devices (e.g., “cloud-based” processing and/or storage), and 
 	the speaker voice information transmitted to the external server corresponds to only the voice emitted in the state where the registered user opens the mouth (Lord et al. [0072] the AAFS is configured to receive data that represents an utterance of a speaker and that is obtained at or about a hearing device associated with a user. The AAFS may then identify the speaker based at least in part on the received data, such as by performing speaker recognition and/or speech recognition with the received data. The AAFS may then determine speaker-related information associated with the identified speaker, such as an identifier (e.g., name or title) of the speaker, [1420] At block 23.7902, the process performs transmitting the text along with an audio representation of the utterance and an identifier of the speaker to a remote conferencing device and/or a conference call system. After having performed the speech recognition, the mobile device may transmit the obtained text along with an identifier of the speaker and the audio representation of the speaker’s utterance to a remote system or device, [0515] remote devices (e.g., “cloud-based” processing and/or storage. The Examiner notes that in Lord et al. the voice is live in real time and the user is talking with his/her mouth, and the speaker voice information from that real time voice is transmitted.)
Cho et al. and Lord et al. are analogous art because they are from a similar field of endeavor in the Signal processing and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of identifying the speaker based on matching the face information from the user making conversation with the voice information extracted from the voice as taught by Cho et al., using transmitting the speech recognition load which are obtained text along with the identifier of the speaker and the audio representation of the speaker’s utterance to the remote system as taught by Lord et al. for the benefit of distributing speech recognition load among multiple distributed communication devices used by the speakers in the conference (Lord et al. [1420] At block 23.7902, the process performs transmitting the text along with an audio representation of the utterance and an identifier of the speaker to a remote conferencing device and/or a conference call system. After having performed the speech recognition, the mobile device may transmit the obtained text along with an identifier of the speaker and the audio representation of the speaker's utterance to a remote system or device. In this manner, the speech recognition load may be distributed among multiple distributed communication devices used by the speakers in the voice conference.)

With respect to Claim 2, Cho et al. in view of Lord et al. teach
 	wherein face identification information for identifying a face of a specific user is previously registered, and when a user identified using the face identification information is identical with a user identified using the voice identification information, the recognizing process comprises recognizing the speaker voice of the user (Cho et al. [0072] if the processor 120 extracts a mouth of a face of the user to recognize changes in a shape of the mouth, the processor 120 may be configured to determine that the user is making conversations, For example, the processor 120 may be configured to match the face information of the user with the voice information extracted from the voice of the user input along with an image.)

With respect to Claim 3, Cho et al. in view of Lord et al. teach
 	wherein the user includes a plurality of user (Cho et al. Fig. 3A elements 310-315, [0076] For example, if a plurality of users utter voices, the processor 120 may be configured to extract voice information from each of the voices respectively uttered by the plurality of users. The processor 120 may be configured to compare the information of each of the plurality of users with pre-stored information to determine whether the plurality of users are respectively family members.)

With respect to Claim 6, Cho et al. disclose 
  	A voice processing method, comprising: 
detecting an opening state of a user (Cho et al. [0072] if the processor 120 extracts a mouth of a face of the user to recognize changes in a shape of the mouth, the processor 120 may be configured to determine that the user is making conversations, For example, the processor 120 may be configured to match the face information of the user with the voice information extracted from the voice of the user input along with an image, [0089] the processor 120 may be configured to recognize a fifth user and a sixth user from a captured image. If it is determined that face angles and face poses of the fifth and sixth users are face angles and face poses at which faces of the fifth and sixth users face each other, and mouth shapes respectively extracted from faces of the fifth and sixth users are changed, the processor 120 may be configured to determine that the fifth and sixth users make conversations); and 
 	acquiring voice information (Cho et al. Fig. 3A elements 310-315, [0076] For example, if a plurality of users utter voices, the processor 120 may be configured to extract voice information from each of the voices respectively uttered by the plurality of users. The processor 120 may be configured to compare the information of each of the plurality of users with pre-stored information to determine whether the plurality of users are respectively family members); 
 	identification information previously registered to identify voice of a specific user (Cho et al. Fig. 3A elements 310-315, [0076] For example, if a plurality of users utter voices, the processor 120 may be configured to extract voice information from each of the voices respectively uttered by the plurality of users. The processor 120 may be configured to compare the information of each of the plurality of users with pre-stored information to determine whether the plurality of users are respectively family members);
recognizing only voice emitted in a state where the user who is registered opens a mouth as a speaker voice based on the opening state which is detected, the voice information which is acquired, and the identification information (Cho et al. [0012] The processor may be configured to acquire respective images and voices of the family members to generate family member information, to register the family member information in the storage unit, and, in response to at least one of the captured image and the user voice being input, to compare the at least one of the captured image and the user voice with the family member information to determine whether the conversations are made between the family members, [0016] In response to a name of at least one of family members taking part in conversation, being detected form the user voice, the processor may be configured to determine that the conversations are made between the family members, [0060] the processor 120 may be configured to extract voice feature information (or voice information) from a voice of the user and to store the voice feature information in the storage unit 110, [0072] if the processor 120 extracts a mouth of a face of the user to recognize changes in a shape of the mouth, the processor 120 may be configured to determine that the user is making conversations, For example, the processor 120 may be configured to match the face information of the user with the voice information extracted from the voice of the user input along with an image, [0074] if at least one of a captured image and a user voice is input, the processor 120 may be configured to compare the at least one of the captured image and the user voice with family member information to determine whether conversations are make between family members); and 
	Cho et al. fail to explicitly teach 
transmitting speaker voice information which is information of the speaker voice which is recognized to an external server, wherein the speaker voice information transmitted to the external server corresponds to only the voice emitted in the state where the registered user opens the mouth.
However, Lord et al. teach
transmitting speaker voice information which is information of the speaker voice which is recognized to an external server, wherein the speaker voice information transmitted to the external server corresponds to only the voice emitted in the state where the registered user opens the mouth (Lord et al. [0072] the AAFS is configured to receive data that represents an utterance of a speaker and that is obtained at or about a hearing device associated with a user. The AAFS may then identify the speaker based at least in part on the received data, such as by performing speaker recognition and/or speech recognition with the received data. The AAFS may then determine speaker-related information associated with the identified speaker, such as an identifier (e.g., name or title) of the speaker, [1420] At block 23.7902, the process performs transmitting the text along with an audio representation of the utterance and an identifier of the speaker to a remote conferencing device and/or a conference call system. After having performed the speech recognition, the mobile device may transmit the obtained text along with an identifier of the speaker and the audio representation of the speaker’s utterance to a remote system or device, [0515] remote devices (e.g., “cloud-based” processing and/or storage. The Examiner notes that in Lord et al. the voice is live in real time and the user is talking with his/her mouth, and the speaker voice information from that real time voice is transmitted.)
Cho et al. and Lord et al. are analogous art because they are from a similar field of endeavor in the Signal processing and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of identifying the speaker based on matching the face information from the user making conversation with the voice information extracted from the voice as taught by Cho et al., using transmitting the speech recognition load which are obtained text along with the identifier of the speaker and the audio representation of the speaker’s utterance to the remote system as taught by Lord et al. for the benefit of distributing speech recognition load among multiple distributed communication devices used by the speakers in the conference (Lord et al. [1420] At block 23.7902, the process performs transmitting the text along with an audio representation of the utterance and an identifier of the speaker to a remote conferencing device and/or a conference call system. After having performed the speech recognition, the mobile device may transmit the obtained text along with an identifier of the speaker and the audio representation of the speaker's utterance to a remote system or device. In this manner, the speech recognition load may be distributed among multiple distributed communication devices used by the speakers in the voice conference.)
 
6.	Claim 4 is rejected under 35 U.S.C.103 as being unpatentable over 
Cho et al. (US 2017/0068512 A1) in view of Lord et al. (US 2015/0336578 A1) and Wulf (US 2016/0214618 A1.)

With respect to Claim 4, Cho et al. in view of Lord et al. teach all the limitations of Claim 1 upon which Claim 4 depends. Cho et al. in view of Lord et al. fail to explicitly teach 
 	wherein the user is a driver.  
	However, Wulf teaches 
wherein the user is a driver (Wulf [0024] Moreover, an analysis of the camera image provided by the camera device makes it possible to monitor the mouth region of the driver, especially for lip movements. If movements of the mouth region, especially the lips, of the driver are detected, it is inferred in particular that the driver is speaking or is conversing with a further passenger of motor vehicle 1, for instance.)
Cho et al., Lord et al. and Wulf are analogous art because they are from a similar field of endeavor in the Signal processing and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of identifying the speaker based on matching the face information from the user making conversation with the voice information extracted from the voice as taught by Cho et al., using transmitting the speech recognition load which are obtained text along with the identifier of the speaker and the audio representation of the speaker’s utterance to the remote system as taught by Lord et al. for the benefit of distributing speech recognition load among multiple distributed communication devices used by the speakers in the conference, using teaching of monitoring the mouth region of the driver as taught by Wulf for the benefit of inferring that the driver is speaking (Wulf [0024] Moreover, an analysis of the camera image provided by the camera device makes it possible to monitor the mouth region of the driver, especially for lip movements. If movements of the mouth region, especially the lips, of the driver are detected, it is inferred in particular that the driver is speaking or is conversing with a further passenger of motor vehicle 1, for instance.)

7.	Claim 5 is rejected under 35 U.S.C.103 as being unpatentable over 
Cho et al. (US 2017/0068512 A1) in view of Lord et al. (US 2015/0336578 A1) and 
Krishnan et al. (US 2014/0006025 A1.)

With respect to Claim 5, Cho et al. in view of Lord et al. teach all the limitations of Claim 1 upon which Claim 5 depends. Cho et al. in view of Lord et al. fail to explicitly teach 
	wherein the program, when executed by the processor, further performs a process of receiving response information which is information transmitted from the external server in response to the speaker voice information. 
	However, Krishnan et al. teach
 	wherein the program, when executed by the processor, further performs a process of receiving response information which is information transmitted from the external server in response to the speaker voice information (Krishnan et al. Claim 1: capturing audio at a user device; transmitting an identifier of a speaker of the captured audio identified by a speaker recognition system to a server system to determine a corresponding speaker entry; and receiving a resource corresponding to the speaker entry, Fig. 2 elements 240 and 250.)
 	Cho et al., Lord et al. and Krishnan et al. are analogous art because they are from a similar field of endeavor in the Signal processing and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of identifying the speaker based on matching the face information from the user making conversation with the voice information extracted from the voice as taught by Cho et al., using transmitting the speech recognition load which are obtained text along with the identifier of the speaker and the audio representation of the speaker’s utterance to the remote system as taught by Lord et al. for the benefit of distributing speech recognition load among multiple distributed communication devices used by the speakers in the conference, using teaching of a resource provider server as taught by Krishnan et al. for the benefit of transmitting the identifier of the speaker to the server to determine the corresponding speaker entry and receiving the resource corresponding to the speaker entry (Krishnan et al. Claim 1: capturing audio at a user device; transmitting an identifier of a speaker of the captured audio identified by a speaker recognition system to a server system to determine a corresponding speaker entry; and receiving a resource corresponding to the speaker entry, Fig. 2 elements 240 and 250.)

Conclusion
8.	The prior art made of record and not relied upon is considered pertinent to application’s disclosure. See PTO-892
a.	Burke et al. (US 2015/0302870 A1.) In this reference, Burke et al. disclose a method for multisensory speech detection. 
b. 	Owen et al. (US 2021/0233652 A1.) In this reference, Owen et al. disclose a method for identifying a speaker via the voiceprint and/or face print. 
c. 	Horton et al. (US 2020/0411013 A1.) In this reference, Horton et al. disclose a method for identifying a speaker by comparing a voice print of the speaker with a plurality of registered voice prints. 

9.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

10. 	Any inquiry concerning this communication or earlier communications from the examiner should be directed to THUYKHANH LE whose telephone number is (571)272-6429. The examiner can normally be reached Mon-Fri: 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C. Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/THUYKHANH LE/Primary Examiner, Art Unit 2655