DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Introduction
This office action is in response to communications filed on 03/29/2021, and preliminary amendment filed 04/29/2021. Claims 1-14 are pending and likewise Claims 1-14 have been examined.

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 07/30/2021, 12/03/2021 and 12/16/2021, are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements is being considered by the examiner.

Claim Objections
Claim 5 objected to because of the following informalities:
On line 4, “fame”, should be “frame”.

Claim 8 is objected to because of the following informalities:  
On line 6, “the terminal” has improper antecedent basis, “terminal” is used in Claim 1 and dependents, rather than “apparatus”, used in Claim 6.
Claim 9 is objected to because of the following informalities:  
On line 6, “fame” should be “frame”. On line 3, 4, 5, 7, and 8, “the terminal” has improper antecedent basis, “terminal” is used in Claim 1 and dependents, rather than “apparatus” in Claim 6.

Claim 14 is objected to because of the following informalities:  
On line 5, “fame”, should be “frame”.

Appropriate correction is required.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 3-6, 8-11 and 13-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jeong (US 20140002574 A1), and further in view of Wang et al. (WO 2017080399 A1).

Regarding Claim 1: 
Jeong teaches a voice processing method, comprising: determining, by a terminal, that the terminal is making a video call or recording a video(Para [0010], Ln 5-6, providing a video communication service in an electronic device);  
determining, by the terminal, that a current video frame contains a face(Para [0045], Ln 4-6, user's face area obtained via the camera system 150 with the user's face area obtained from the reference image),  
and that a voice exists in a surrounding environment of the terminal(Para [0049], Ln 13-14, In the case where a voice for the video communication service is detected);
obtaining, by the terminal, a target distance between the target face and the terminal(Para [0016], Ln 4-5, determining a distance from the electronic device to a user's face);  
determining, by the terminal, a target gain based on the target distance, wherein as the target distance increases, the target gain increases(Para [0052], Ln 12-18, user's face is greater than a reference distance...raises the microphone sensitivity. ...distance from the electronic device to the user's face is less than the reference distance...lowers the microphone sensitivity);  
and performing, by the terminal, an enhancement processing operation on the voice based on the target gain to obtain a target voice signal(Para [0052], Ln 8-18 detects a voice for the video communication service via a microphone. …..the video communication control program 114 raises the microphone sensitivity.....the video communication control program 114 lowers the microphone sensitivity. Is based on target gain, as the produced voice signal is either a higher or lower volume depending on how the sensitivity was changed). 
Jeong does not teach determining, by the terminal, that a target face in the surrounding environment corresponds to the face in the current video frame. 
In the same field of Facial Recognition, Wang teaches determining, by the terminal, that a target face in the surrounding environment corresponds to the face in the current video frame(Pg 5, Para 3, Ln 1-3, When it is detected that at least two groups of faces exist in the current frame picture, the area corresponding to the face with the largest picture area in the current frame is selected as the face area of the current frame picture).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify Jeong, with the facial recognition of Wang, as it improves performance and user experience(Pg 3, Last Paragraph before “DRAWINGS”, starts with “The face location tracking…”, on Ln 13-15, improved face detection and tracking, and user experience).

Regarding Claim 3:
The combination of Jeong and Wang teaches the method according to claim 1, but Jeong does not teach wherein the determining a target face in the current video picture comprises: in response to determining that a plurality of faces exists in the current video frame, determining a face in the surrounding environment corresponding to a face with a largest area among the plurality of faces as the target face, or a face in the surrounding environment closest to the terminal among the plurality of faces as the target face; or in response to determining that only one face exists in the current video frame, determining the face as the target face(optional limitation).
In the same field of Facial Recognition, Wang teaches wherein the determining a target face in the current video picture comprises: in response to determining that a plurality of faces exists in the current video frame, determining a face in the surrounding environment corresponding to a face with a largest area among the plurality of faces as the target face(Pg 5, Para 3, Ln 1-3, When it is detected that at least two groups of faces exist in the current frame picture, the area corresponding to the face with the largest picture area in the current frame is selected as the face area of the current frame picture), 
or a face in the surrounding environment closest to the terminal among the plurality of faces as the target face(optional limitation);
or in response to determining that only one face exists in the current video frame, determining the face as the target face(optional limitation).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify the combination of Jeong and Wang, with the facial recognition of Wang, as it improves performance and user experience(Pg 3, Last Paragraph before “DRAWINGS”, starts with “The face location tracking…”, on Ln 13-15, improved face detection and tracking, and user experience).

Regarding Claim 4: 
The combination of Jeong and Wang teaches the method according to claim 1, and Jeong teaches wherein the obtaining the target distance between the target face and the terminal comprises: measuring a distance between the target face and the terminal by using a depth component in the terminal(Para [0047], Ln 2-6, determine the distance from the electronic device to the user's face with…. using an ultrasonic sensor). 
 
Regarding Claim 5: 
The combination of Jeong and Wang teaches the method according to claim 1, and Jeong teaches wherein the obtaining the target distance between the target face and the terminal comprises:  obtaining the target distance between the target face and the terminal based on a region area of a face in the current video frame corresponding to the target face and a preset correspondence between a region area of the face and a distance between the face and the terminal (Para [0045],Ln 4-10, compares the user's face area obtained via the camera system 150 with the user's face area obtained from the reference image at a point of providing a video communication service to determine the distance from the electronic device to the user's face. Here, the reference image denotes an image of the user positioned at a reference distance);  
or obtaining the target distance between the target face and the terminal based on a face-to-screen ratio of a face in the current video frame(Optional limitation).

	Regarding Claim 6:
	Jeong teaches A voice processing apparatus, comprising: a processor; a memory coupled to the processor and storing instructions, which, when executed, cause the processor to perform operations comprising(Para [0018], Ln 3-6, The device includes at least one processor, a memory, and at least one program stored in the memory and configured for being executable by the at least one processor): 
determining that the apparatus is making a video call or recording a video(Para [0010], Ln 5-6, providing a video communication service in an electronic device), 
determining that a current video frame contains a face(Para [0045], Ln 4-6, user's face area obtained via the camera system 150 with the user's face area obtained from the reference image), 
and that a voice exists in a surrounding environment of the apparatus(Para [0049], Ln 13-14, In the case where a voice for the video communication service is detected), 
obtaining a target distance between the target face and the apparatus(Para [0016], Ln 4-5, determining a distance from the electronic device to a user's face), 
determining a target gain based on the target distance, wherein as the target distance increases, the target gain increases(Para [0052], Ln 12-18, user's face is greater than a reference distance...raises the microphone sensitivity. ...distance from the electronic device to the user's face is less than the reference distance...lowers the microphone sensitivity), 
and performing an enhancement processing operation on the voice based on the target gain to obtain a target voice signal(Para [0052], Ln 8-18 detects a voice for the video communication service via a microphone. …..the video communication control program 114 raises the microphone sensitivity.....the video communication control program 114 lowers the microphone sensitivity. Is based on target gain, as the produced voice signal is either a higher or lower volume depending on how the sensitivity was changed).
Jeong does not teach determining that a target face in the surrounding environment corresponds to the face in the current video frame.
In the same field of Facial Recognition, Wang teaches determining that a target face in the surrounding environment corresponds to the face in the current video frame(Pg 5, Para 3, Ln 1-3, When it is detected that at least two groups of faces exist in the current frame picture, the area corresponding to the face with the largest picture area in the current frame is selected as the face area of the current frame picture).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify Jeong, with the facial recognition of Wang, as it improves performance and user experience(Pg 3, Last Paragraph before “DRAWINGS”, starts with “The face location tracking…”, on Ln 13-15, improved face detection and tracking, and user experience).

Regarding Claim 8:
The combination of Jeong and Wang teaches the apparatus according to claim 6, but Jeong does not teach wherein the operations further comprising: in response to determining that a plurality of faces exists in the current video frame, determining a face in the surrounding environment corresponding to a face with a largest area among the plurality of faces as the target face, or a face in the surrounding environment closest to the terminal among the plurality of faces as the target face; or in response to determining that only one face exists in the current video frame, determining the face as the target face(optional limitation).
In the same field of Facial Recognition, Wang teaches wherein the operations further comprising: in response to determining that a plurality of faces exists in the current video frame, determining a face in the surrounding environment corresponding to a face with a largest area among the plurality of faces as the target face(Pg 5, Para 3, Ln 1-3, When it is detected that at least two groups of faces exist in the current frame picture, the area corresponding to the face with the largest picture area in the current frame is selected as the face area of the current frame picture), 
or a face in the surrounding environment closest to the terminal among the plurality of faces as the target face(optional limitation);
or in response to determining that only one face exists in the current video frame, determining the face as the target face(optional limitation).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify the combination of Jeong and Wang, with the facial recognition of Wang, as it improves performance and user experience(Pg 3, Last Paragraph before “DRAWINGS”, starts with “The face location tracking…”, on Ln 13-15, improved face detection and tracking, and user experience).

Regarding Claim 9:
The combination of Jeong and Wang teaches the apparatus according to claim 6, wherein the operations further comprising: measuring a distance between the target face and the terminal by using a depth component in the terminal(Para [0047], Ln 2-6, determine the distance from the electronic device to the user's face with…. using an ultrasonic sensor);
obtaining the target distance between the target face and the terminal based on a region area of a face in the current video fame corresponding to the target face and a preset correspondence between a region area of a face and a distance between the face and the terminal(Para [0045],Ln 4-10, compares the user's face area obtained via the camera system 150 with the user's face area obtained from the reference image at a point of providing a video communication service to determine the distance from the electronic device to the user's face. Here, the reference image denotes an image of the user positioned at a reference distance);
or obtaining the target distance between the target face and the terminal based on face-to-screen ratio of a face in the current video frame(optional limitation).

Regarding Claim 10:
A terminal device, wherein the terminal device comprises a memory, a processor, a bus, a camera, and a microphone(Para [0018], Ln 3-6, The device includes at least one processor, a memory. Para [0041], Ln 7, camera systems. Para [0062], Ln 3, microphone. Para [0059], Ln 1-3, The peripheral interface), 
wherein the memory, the camera, the microphone, and the processor are connected through the bus(Para [0059], Ln 1-3, The peripheral interface 123 controls connections of an I/O peripheral device and the processor 122 of the electronic device 100 and the memory interface 121); 
wherein the camera is configured to capture an image signal(Para [0054], Ln 5-6, obtains an image of the user via the camera system); 
wherein the microphone is configured to collect a voice signal(Para [0052], Ln 8-9, detects a voice for the video communication service via a microphone); 
wherein the memory is configured to store instructions(Para [0018], Ln 3-6, program stored in the memory); 
and wherein the processor is configured to execute the instructions stored in the memory to control the camera and the microphone, cause the terminal device to perform operations comprising(Para [0018], Ln 3-6, The device includes at least one processor, a memory, and at least one program stored in the memory and configured for being executable by the at least one processor. Para [0059], Ln 1-3, The peripheral interface 123 controls connections of an I/O peripheral device and the processor 122 of the electronic device 100 and the memory interface 121): 
determining that the terminal is making a video call or recording a video(Para [0010], Ln 5-6, providing a video communication service in an electronic device), 
determining that a current video frame contains a face(Para [0045], Ln 4-6, user's face area obtained via the camera system 150 with the user's face area obtained from the reference image), 
and that a voice exists in a surrounding environment of the terminal(Para [0049], Ln 13-14, In the case where a voice for the video communication service is detected), 
obtaining a target distance between the target face and the terminal(Para [0016], Ln 4-5, determining a distance from the electronic device to a user's face), 
determining a target gain based on the target distance, wherein as the target distance increases, the target gain increases(Para [0052], Ln 12-18, user's face is greater than a reference distance...raises the microphone sensitivity. ...distance from the electronic device to the user's face is less than the reference distance...lowers the microphone sensitivity), 
and performing an enhancement processing operation on the voice based on the target gain to obtain a target voice signal(Para [0052], Ln 8-18 detects a voice for the video communication service via a microphone. …..the video communication control program 114 raises the microphone sensitivity.....the video communication control program 114 lowers the microphone sensitivity. Is based on target gain, as the produced voice signal is either a higher or lower volume depending on how the sensitivity was changed).
Jeong does not teach determining that a target face in the surrounding environment corresponds to the face in the current video frame.
In the same field of Facial Recognition, Wang teaches determining that a target face in the surrounding environment corresponds to the face in the current video frame(Pg 5, Para 3, Ln 1-3, When it is detected that at least two groups of faces exist in the current frame picture, the area corresponding to the face with the largest picture area in the current frame is selected as the face area of the current frame picture).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify Jeong, with the facial recognition of Wang, as it improves performance and user experience(Pg 3, Last Paragraph before “DRAWINGS”, starts with “The face location tracking…”, on Ln 13-15, improved face detection and tracking, and user experience).

Regarding Claim 11:
The combination of Jeong and Wang teaches the terminal device according to claim 10, and Jeong teaches wherein the terminal device further comprises an antenna system(Para [0039], Ln 1-3, the electronic device includes......a smart phone. An antenna is inherent to the smart phone, to communicate wirelessly, an antenna is required, wireless communication networks are used as shown in following citations to Para [0061] and [0019]),  
and the antenna system receives and sends, under control of the processor(Para [0019], Ln 1-10, video communication service is provided. ...executable by the at least one processor... receiving video communication data from a communication counterpart. Sending to the counterpart user is also done, Para [0006], Ln 10-12, Thus, a transmission volume and an image transferred to the counterpart user. Antenna is inherent for same reasons stated above),  
a wireless communication signal to implement wireless communication with a mobile communications network(Para [0061], Ln 2-6, performing a communication function.....supporting different communication networks),  
wherein the mobile communications network comprises one or more of the following: a GSM network, a CDMA network, a 3G network, a 4G network, a 5G network, an FDMA network, a TDMA network, a PDC network, a TACS network, an AMPS network, a WCDMA network, a TDSCDMA network, a Wi-Fi network, and an LTE network(Para [0061], Ln 6-14, communication network may include.. (GSM)....(CDMA)…..(LTE) ….(OFDMA) network).

Regarding Claim 13:
	Claim 13 contains similar limitations as Claim 8, and is therefore rejected for the same reasons.

Regarding Claim 14:
	Claim 14 contains similar limitations as Claim 9, and is therefore rejected for the same reasons.

Claim(s) 2, 7 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Jeong and Wang as applied to claim 1 above, and further in view of Avendano et al. (US 9210503 B2).

Regarding Claim 2:
The combination of Jeon and Wang teaches the method according to claim 1, but does not teach wherein the method further comprises: weakening a non-voice signal in the surrounding environment based on a preset noise reduction gain to obtain a target noise signal; and synthesizing the target voice signal and the target noise signal to obtain a target voice signal.
In the same field of Audio Zoom, Avendano teaches wherein the method further comprises: weakening a non-voice signal in the surrounding environment based on a preset noise reduction gain to obtain a target noise signal(Col 3, Ln 47-50 acoustic signals from an acoustic source via the primary microphone.... Processing may include performing noise reduction within an acoustic signal. Col 3, Ln 14-15, discriminate speech and noise. Col 2 Ln 63 - Col 3 Ln 1, While the microphones 106 and 108 receive sound (i.e. acoustic signals..... the noise 122 may include any sounds from one or more locations that differ from the location of audio source. Col 4, Ln 49-54, The gain factor may indicate whether multiplicative gain expansion module 320 should perform additional gain or reduction to the signals received from beam former module….gain factor is generated as an energy ratio based on the received microphone signals and components);  
and synthesizing the target voice signal and the target noise signal to obtain a target voice signal(Col 5, Ln 32-34, Multiplicative gain expansion module 320 outputs the modified signal and provides signal to reverb module. Col 6, Ln 7-9, The mixing module 340 receives the reverberation adjusted signal and mixes the signal with the signal from the primary microphone). 
It would have been obvious for one skilled in the art, at the effective time of filling to modify the combination of Jeong and Wang, with the audio zoom of Avendano, as it removes audio from unwanted sources, improving quality. 

Regarding Claim 7:
Claim 7 contains similar limitations as Claim 2, and is therefore rejected for the same reasons.

Regarding Claim 12:
Claim 12 contains similar limitations as Claim 2, and is therefore rejected for the same reasons.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Shintani et al. (US 9984690 B1)
Adjusting gain based on distance from user measured using laser.
	Endo (US 20160064012 A1)
Adjusting gain based on distance to user.
	Etter (US 20140112483 A1)
Adjusting gain based on distance to user face measured with camera.
	Kim (US 20140105429 A1)
Adjusting gain based on distance to user face measured with camera.
	Zurek et al. (US 20130202132 A1)
Adjusting gain based on distance to user face measured with camera.
	Kim (US 20110085061 A1)
Adjusting gain based on distance to user face measured with camera.
	Laaksonen et al. (US 10397699 B2)
Audio zoom, lessens/mutes noise and makes speaker clearer.
	Park et al. “A Beamforming-Based Video-Zoom Driven Audio-Zoom Algorithm for Portable Digital Imaging Devices”
Audio zoom, based off of video zoom.
	Duong et al. “AUDIO ZOOM FOR SMARTPHONES BASED ON MULTIPLE ADAPTIVE BEAMFORMERS”
Audio zoom, based on beamforming.
	Thiergart et al. “AN ACOUSTICAL ZOOM BASED ON INFORMED SPATIAL FILTERING”
Audio zoom paired with video zoom.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALEXANDER G MARLOW whose telephone number is (571)272-4536. The examiner can normally be reached Monday - Thursday 10:00 am - 8:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richmond Dorvil can be reached on (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ALEXANDER G MARLOW/Assistant Examiner, Art Unit 2658                                                                                                                                                                                                        
/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658