DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
2.	The information disclosure statement (IDS) submitted on 06/18/2020 and 0316/2021 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
3.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

4.	Claims 1-3, 5 are rejected under 35 U.S.C.103 as being unpatentable over 
Cho et al. (US 2017/0068512 A1) in view of Lee et al. (US 2015/0339098 A1.)

	With respect to Claim 1, Cho et al. disclose 
 	A voice processing apparatus, comprising: 
 	a processor to execute a program (Cho et al. [0130] For example, the communicator 160 performs a function of transmitting a user captured image, voice data, conversation feature information generated by the processor 120, etc. to an external device or receiving a user captured image, voice data, etc. from the external device); and 
 	a memory to store the program which, when executed by the processor (Cho et al. 0142] The non-transitory computer readable medium may, for example, store data semi-permanently and is readable by devices. For example, the aforementioned applications or programs may be stored in the non-transitory computer readable media such as compact disks (CDs), digital video disks (DVDs), hard disks, Blu-ray disks, universal serial buses (USBs), memory cards, and read-only memory (ROM)), performs processes of, 
detecting an opening state of a mouth of a user (Cho et al. [0072] if the processor 120 extracts a mouth of a face of the user to recognize changes in a shape of the mouth, the processor 120 may be configured to determine that the user is making conversations, For example, the processor 120 may be configured to match the face information of the user with the voice information extracted from the voice of the user input along with an image, [0089] the processor 120 may be configured to recognize a fifth user and a sixth user from a captured image. If it is determined that face angles and face poses of the fifth and sixth users are face angles and face poses at which faces of the fifth and sixth users face each other, and mouth shapes respectively extracted from faces of the fifth and sixth users are changed, the processor 120 may be configured to determine that the fifth and sixth users make conversations); and 
acquiring voice information, wherein voice identification information for identifying voice of a specific user is previously registered (Cho et al. Fig. 3A elements 310-315, [0076] For example, if a plurality of users utter voices, the processor 120 may be configured to extract voice information from each of the voices respectively uttered by the plurality of users. The processor 120 may be configured to compare the information of each of the plurality of users with pre-stored information to determine whether the plurality of users are respectively family members), the voice processing apparatus further comprises: 
recognizing only voice emitted in a state where the user who is registered opens the mouth as a speaker voice based on the opening state which is detected, the voice information which is acquired, and the voice identification information (Cho et al. [0012] The processor may be configured to acquire respective images and voices of the family members to generate family member information, to register the family member information in the storage unit, and, in response to at least one of the captured image and the user voice being input, to compare the at least one of the captured image and the user voice with the family member information to determine whether the conversations are made between the family members, [0016] In response to a name of at least one of family members taking part in conversation, being detected form the user voice, the processor may be configured to determine that the conversations are made between the family members, [0060] the processor 120 may be configured to extract voice feature information (or voice information) from a voice of the user and to store the voice feature information in the storage unit 110, [0072] if the processor 120 extracts a mouth of a face of the user to recognize changes in a shape of the mouth, the processor 120 may be configured to determine that the user is making conversations, For example, the processor 120 may be configured to match the face information of the user with the voice information extracted from the voice of the user input along with an image, [0074] if at least one of a captured image and a user voice is input, the processor 120 may be configured to compare the at least one of the captured image and the user voice with family member information to determine whether conversations are make between family members. The Examiner note that Cho et al. teach extracting conversation feature information from the user captured image and the user voice. Cho et al. determines that the user is making conversation based on the mouth open of the user and next matches the face information of the user with the voice information of the user. It implies that the system recognizes only voice from the user who is opens the mouth.); and 
	Cho et al. fail to explicitly teach 
transmitting speaker voice information which is information of the speaker voice which is recognized to an external server. 
However, Lee et al. teach 
transmitting speaker voice information which is information of the speaker voice which is recognized to an external server (Lee et al. [0076] the processor 130 may directly perform the operation to analyze the user voice and to convert to text, [0097] the processor 130 may transmit the received texts to the server apparatus, receive search results corresponding to the texts from the server apparatus.)
Cho et al. and Lee et al. are analogous art because they are from a similar field of endeavor in the Signal processing and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of identifying the speaker based on matching the face information from the user making conversation with the voice information extracted from the voice as taught by Cho et al., using teaching of sending the text result from the voice recognition of the user voice to the server as taught by Lee et al. for the benefit of receiving search result corresponding to the text from the server (Lee et al. [0076] the processor 130 may directly perform the operation to analyze the user voice and to convert to text, [0097] the processor 130 may transmit the received texts to the server apparatus, receive search results corresponding to the texts from the server apparatus.)

With respect to Claim 2, Cho et al. in view of Lee et al. teach
 	wherein face identification information for identifying a face of a specific user is previously registered, and when a user identified using the face identification information is identical with a user identified using the voice identification information, the recognizing process comprises recognizing the speaker voice of the user (Cho et al. [0072] if the processor 120 extracts a mouth of a face of the user to recognize changes in a shape of the mouth, the processor 120 may be configured to determine that the user is making conversations, For example, the processor 120 may be configured to match the face information of the user with the voice information extracted from the voice of the user input along with an image.)

With respect to Claim 3, Cho et al. in view of Lee et al. teach
 	wherein the user includes a plurality of user (Cho et al. Fig. 3A elements 310-315, [0076] For example, if a plurality of users utter voices, the processor 120 may be configured to extract voice information from each of the voices respectively uttered by the plurality of users. The processor 120 may be configured to compare the information of each of the plurality of users with pre-stored information to determine whether the plurality of users are respectively family members)

With respect to Claim 5, Cho et al. in view of Lee et al. teach
 	wherein the program, when executed by the processor, further performs a process of receiving response information which is information transmitted from the external server in response to the speaker voice information (Lee et al. [0076] the processor 130 may directly perform the operation to analyze the user voice and to convert to text, [0097] the processor 130 may transmit the received texts to the server apparatus, receive search results corresponding to the texts from the server apparatus.)

With respect to Claim 6, Cho et al. disclose 
  	A voice processing method, comprising: 
detecting an opening state of a user (Cho et al. [0072] if the processor 120 extracts a mouth of a face of the user to recognize changes in a shape of the mouth, the processor 120 may be configured to determine that the user is making conversations, For example, the processor 120 may be configured to match the face information of the user with the voice information extracted from the voice of the user input along with an image, [0089] the processor 120 may be configured to recognize a fifth user and a sixth user from a captured image. If it is determined that face angles and face poses of the fifth and sixth users are face angles and face poses at which faces of the fifth and sixth users face each other, and mouth shapes respectively extracted from faces of the fifth and sixth users are changed, the processor 120 may be configured to determine that the fifth and sixth users make conversations); and 
 	acquiring voice information (Cho et al. Fig. 3A elements 310-315, [0076] For example, if a plurality of users utter voices, the processor 120 may be configured to extract voice information from each of the voices respectively uttered by the plurality of users. The processor 120 may be configured to compare the information of each of the plurality of users with pre-stored information to determine whether the plurality of users are respectively family members); 
 	identification information previously registered to identify voice of a specific user (Cho et al. Fig. 3A elements 310-315, [0076] For example, if a plurality of users utter voices, the processor 120 may be configured to extract voice information from each of the voices respectively uttered by the plurality of users. The processor 120 may be configured to compare the information of each of the plurality of users with pre-stored information to determine whether the plurality of users are respectively family members);
recognizing only voice emitted in a state where the user who is registered opens a mouth as a speaker voice based on the opening state which is detected, the voice information which is acquired, and the identification information (Cho et al. [0012] The processor may be configured to acquire respective images and voices of the family members to generate family member information, to register the family member information in the storage unit, and, in response to at least one of the captured image and the user voice being input, to compare the at least one of the captured image and the user voice with the family member information to determine whether the conversations are made between the family members, [0016] In response to a name of at least one of family members taking part in conversation, being detected form the user voice, the processor may be configured to determine that the conversations are made between the family members, [0060] the processor 120 may be configured to extract voice feature information (or voice information) from a voice of the user and to store the voice feature information in the storage unit 110, [0072] if the processor 120 extracts a mouth of a face of the user to recognize changes in a shape of the mouth, the processor 120 may be configured to determine that the user is making conversations, For example, the processor 120 may be configured to match the face information of the user with the voice information extracted from the voice of the user input along with an image, [0074] if at least one of a captured image and a user voice is input, the processor 120 may be configured to compare the at least one of the captured image and the user voice with family member information to determine whether conversations are make between family members); and 
transmitting speaker voice information which is information of the speaker voice which is recognized to an external server.
 	Cho et al. fail to explicitly teach 
transmitting speaker voice information which is information of the speaker voice which is recognized to an external server. 
However, Lee et al. teach 
transmitting speaker voice information which is information of the speaker voice which is recognized to an external server (Lee et al. [0076] the processor 130 may directly perform the operation to analyze the user voice and to convert to text, [0097] the processor 130 may transmit the received texts to the server apparatus, receive search results corresponding to the texts from the server apparatus.)
Cho et al. and Lee et al. are analogous art because they are from a similar field of endeavor in the Signal processing and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of identifying the speaker based on matching the face information from the user making conversation with the voice information extracted from the voice as taught by Cho et al., using teaching of sending the text result from the voice recognition of the user voice to the server as taught by Lee et al. for the benefit of receiving search result corresponding to the text from the server (Lee et al. [0076] the processor 130 may directly perform the operation to analyze the user voice and to convert to text, [0097] the processor 130 may transmit the received texts to the server apparatus, receive search results corresponding to the texts from the server apparatus.)

5.	Claim 4 is rejected under 35 U.S.C.103 as being unpatentable over 
Cho et al. (US 2017/0068512 A1) in view of Lee et al. (US 2015/0339098 A1) and Wulf (US 2016/0214618 A1.)

With respect to Claim 4, Cho et al. in view of Lee et al. teach all the limitations of Claim 1 upon which Claim 4 depends. Cho et al. in view of Lee et al. fail to explicitly teach 
 	wherein the user is a driver.  
	However, Wulf teaches 
 wherein the user is a driver (Wulf [0024] Moreover, an analysis of the camera image provided by the camera device makes it possible to monitor the mouth region of the driver, especially for lip movements. If movements of the mouth region, especially the lips, of the driver are detected, it is inferred in particular that the driver is speaking or is conversing with a further passenger of motor vehicle 1, for instance.)
Cho et al., Lee et al. and Wulf are analogous art because they are from a similar field of endeavor in the Signal processing and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of identifying the speaker based on matching the face information from the user making conversation with the voice information extracted from the voice as taught by Cho et al., using teaching of sending the text result from the voice recognition of the user voice to the server as taught by Lee et al. for the benefit of receiving search result corresponding to the text from the server, using teaching of monitoring the mouth region of the driver as taught by Wulf for the benefit of inferring that the driver is speaking (Wulf [0024] Moreover, an analysis of the camera image provided by the camera device makes it possible to monitor the mouth region of the driver, especially for lip movements. If movements of the mouth region, especially the lips, of the driver are detected, it is inferred in particular that the driver is speaking or is conversing with a further passenger of motor vehicle 1, for instance.)

Conclusion
6.	The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. See PTO-892. 
a.	Zurek et al. (US 2014/0350924 A1.) In this reference, Zurek et al. disclose a method/a system for using image data to aid voice recognition. 
b. 	Vasilieff et al. (US 2013/0021459 A1.) In this reference, Vasilieff et al. disclose a method/a system for recognizing speech based on mouth movement detection. 
c. 	VanBlon (US 2015/0154983 A1.) In this reference, VanBlon et al. disclose a method/a system for detecting pause in audible input based on the shape of the user’s mouth. 

7. 	Any inquiry concerning this communication or earlier communications from the examiner should be directed to THUYKHANH LE whose telephone number is (571)272-6429.  The examiner can normally be reached on Mon-Fri: 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C. Flanders can be reached 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/THUYKHANH LE/Primary Examiner, Art Unit 2655