DETAILED ACTION
1.	This communication is in response to the Application filed on 3/24/2021. Claims 8-15 are pending and have been examined. Claims 1-7 are cancelled.
Claim Rejections - 35 USC § 103
2.	Claims 8, 12, 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Sen (US 20140006017; hereinafter SEN) in view of Konchitsy (US 9196249; hereinafter KONCHITSKY), Birchfield, et al. (US 20020097885; hereinafter BIRCHFIELD) and Steinberg, et al. (US 20120287288; hereinafter STEINBERG). 
As per claim 8, SEN (Title: Systems, methods, apparatus, and computer-readable media for generating obfuscated speech signal) discloses “15An audio processing device, comprising: a communicator that acquires audio that is picked up by a microphone in a sound pick up area via a network; a processor that processes the audio that is picked up from the microphone; and a speaker that outputs the audio processed by the processor (SEN, [0109], a privacy zone .. receive the speech signal via one or more microphones; [0225], for use in networks; [Abstract], loudspeaker; [0208], processor), wherein the processor is configured to
[ determine whether or not the audio is a speech uttered by a person by analyzing a frequency of the audio ];
[ detect a position of a sound source of the speech ]; determine whether or not the position of the sound source of the speech is within a specific region (SEN, [0109], it may be desirable to confine the intelligible content of a person's voice to a particular space, such as the cubicle, office, or conference room in which the person is speaking, and to prevent persons outside that space (e.g., in an adjoining room or cubicle) from understanding that speech); and 
[ convert the speech into a substitute sound ] and output the substitute sound to the speaker when the position of the sound source of the speech is within the specific region, and output the speech to the speaker without converting the speech into the substitute sound when the position of the sound source of the speech is outside the specific region (SEN, [0109], the resulting obfuscated speech signal may be used to drive a transducer (e.g., a loudspeaker) to create a masking sound field directed away from the privacy zone <read on a ready mechanism to output any sound based on the position decision which is a system design choice>).”
SEN does not explicitly disclose “determine whether or not the audio is a speech uttered by a person by analyzing a frequency of the audio ..” However, this feature is taught by KONCHITSKY (Title: Method for identifying speech and music components of an analyzed audio signal). 
In the same field of endeavor, KONCHITSKY teaches: [Abstract] “classifying one or more components of an audio signal received from a communication device .. a Goertzel calculation is used to identify different frequency components of a selected sample. The identified frequency components of the selected sample may be analyzed based on one or more predefined factors. The predefined factors pertain to the typical music and speech component of a typical audio signal. The analysis of the frequency component of the selected sample gives resulting values that is compared to a bottom threshold for determining whether the identified component is a music component or a speech component.”
Therefore, it would have been obvious to one of ordinary skill in the art at the time before the effective filing date of the claimed invention to incorporate the teachings of KONCHITSKY. in the system taught by SEN to determine if the received audio signal is a speech.
 “detect a position of a sound source of the speech ..” However, this feature is taught by BIRCHFIELD (Title: Acoustic source localization system and method). 
In the same field of endeavor, BIRCHFIELD teaches: [0020] “An acoustic source location technique compares the time response of acoustic signals reaching the two microphones of each of two or more pairs of spaced-apart microphones.”
Therefore, it would have been obvious to one of ordinary skill in the art at the time before the effective filing date of the claimed invention to incorporate the teachings of BIRCHFIELD in the system taught by SEN and KONCHITSKY to provide sound location information for subsequent system decision.
SEN in view of KONCHITSKY and BIRCHFIELD does not explicitly disclose “convert the speech into a substitute sound ..” However, this feature is taught by STEINBERG (Title: Systems and methods for metering audio and video delays).  
In the same field of endeavor, STEINBERG teaches: [0049] “the speaking person image, transmitted by the camera, is replaced by the White Flash on Diplay-1 screen and the sound produced by this person is replaced by the Audio Beep” and [0066] “timeline positions of said Audio Beep Pulses (e.g. bursts of 1 kHz frequency) match the timeline positions of the White Flashes.”
Therefore, it would have been obvious to one of ordinary skill in the art at the time before the effective filing date of the claimed invention to incorporate the teachings of STEINBERG in the system taught by SEN, KONCHITSKY and BIRCHFIELD, to provide beep sound with desired or designated frequency as the substitute sound.
claim 12 (dependent on claim 8), SEN in view of KONCHITSKY, BIRCHFIELD and STEINBERG further discloses “wherein the communicator acquires a video image captured by a camera in an imaging area, and the processor is configured to convert the video image corresponding to the position of the sound source of the speech into a substitute image and output the substitute image to a display (Examiner’s Note: it appears that the step of acquiring a video image is unnecessary to generate a substitute image. STEINBERG, [0049], the speaking person image, transmitted by the camera, is replaced by the White Flash on Diplay-1 screen <read on a ready mechanism to use any substitute image>).”
Claim 14 (similar in scope to claim 8) is rejected under the same rationale as applied above for claim 8. 
Claim 15 (similar in scope to claim 8) is rejected under the same rationale as applied above for claim 8. In addition, the limitation “a microphone that is installed on one wall surface indoors” is a system design choice.

3.	Claims 9-11 are rejected under 35 U.S.C. 103 as being unpatentable over SEN in view of KONCHITSKY, BIRCHFIELD and STEINBERG, and further in view of Park (US 20100211397; hereinafter PARK).
As per claim 9 (dependent on claim 8), SEN in view of KONCHITSKY, BIRCHFIELD and STEINBERG further discloses “wherein the processor is configured to 
[ convert the audio into an emotion value identifying an emotion ] of a person who utters the audio according to a result of analyzing the audio to acquire the emotion value; and estimate an emotion of the person who utters the audio by [ comparing the emotion value with a table ] in which the emotion value is associated with the emotion.” 
 “convert the audio into an emotion value identifying an emotion .. comparing the emotion value with a table ..” However, this feature is taught by PARK (Title: Facial expression representation apparatus). 
In the same field of endeavor, PARK teaches: [0009] “The parameter related to a change in emotion may include an intensity of a voice signal, a pitch of a voice sound, and voice quality information” which teaches emotion detection/estimation using voice parameters such as the pitch or intensity. PARK also teaches: [0047] “Changes in emotion may be represented by a mixture of six basic emotions, for example, happiness, sadness, anger, fear, disgust, surprise, and the like, which are defined by MPEG4. Probability distributions of parameters related to the basic emotions may be modeled using a Gaussian mixture, and an emotional state may be estimated by calculating which model is the parameter closest to the related input emotion.” Also see AGARWAL below. 
Therefore, it would have been obvious to one of ordinary skill in the art at the time before the effective filing date of the claimed invention to incorporate the teachings of PARK in the system taught by SEN, KONCHITSKY, BIRCHFIELD and STEINBERG to provide detected emotion for audio playback control.
As per claim 10 (dependent on claim 8), SEN in view of KONCHITSKY, BIRCHFIELD and STEINBERG further discloses “wherein the processor is configured to analyze at least [ one of a change in pitch, a speech speed, a sound volume, and a pronunciation of the audio ].”
SEN in view of KONCHITSKY, BIRCHFIELD and STEINBERG does not explicitly disclose “one of a change in pitch, a speech speed, a sound volume, and a pronunciation of the audio ..” However, this feature is taught by PARK (Title: Facial expression representation apparatus). 
In the same field of endeavor, PARK teaches: [0009] “The parameter related to a change in emotion may include an intensity of a voice signal <read on volume>, a pitch of a voice sound, and voice quality information ..”
Therefore, it would have been obvious to one of ordinary skill in the art at the time before the effective filing date of the claimed invention to incorporate the teachings of PARK in the system taught by SEN, KONCHITSKY, BIRCHFIELD and STEINBERG for emotion change detection based on some speech features.
As per claim 11 (dependent on claim 9), SEN in view of KONCHITSKY, BIRCHFIELD, STEINBERG and PARK further discloses “wherein the processor is configured to change a frequency of the substitute sound according to the emotion of the person (PARK, [0009], The parameter related to a change in emotion may include .. a pitch of a voice sound <where pitch reads on fundamental frequency>; STEINBERG, [0049], the sound produced by this person is replaced by the Audio Beep <read on a ready mechanism to use beep of any frequency such as to correspond to emotion change>. Also see AGARWAL below).” 

4.	Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over SEN in view of KONCHITSKY, BIRCHFIELD and STEINBERG, and further in view of Agarwal (US 20090138262; hereinafter AGARWAL).
As per claim 13 (dependent on claim12), SEN in view of KONCHITSKY, BIRCHFIELD and STEINBERG further discloses “wherein the processor is configured to convert the video image corresponding to the position of the sound source of the speech into a different substitute image [ according to the emotion value by comparing the emotion value with a table ] associated with the substitute image and output the converted substitute image to the display (STEINBERG, [0049], the speaking person image, transmitted by the camera, is replaced by the White Flash on Diplay-1 screen).”
SEN in view of KONCHITSKY, BIRCHFIELD and STEINBERG does not explicitly disclose “according to the emotion value by comparing the emotion value with a table ..” However, this feature is taught by AGARWAL (Title: Systems and methods to index and search voice sites). 
In the same field of endeavor, AGARWAL teaches: [0054-0055] “Meta information that will be indexed may include, for example, back end services, databases, or other voice sites to which this voice site links … Indexing <read on selection from database or table> may also be performed based on emotion. Different audio prompts can convey different human emotions” which teaches a ready mechanism also directly applies to image.  
Therefore, it would have been obvious to one of ordinary skill in the art at the time before the effective filing date of the claimed invention to incorporate the teachings of AGARWAL in the system taught by SEN, KONCHITSKY, BIRCHFIELD and STEINBERG, for a ready mechanism to provide different image for emotion-based playback.

Double Patenting

5.	Independent claims 8, 14, 15 are rejected on the ground of non-statutory double patenting as being unpatentable over corresponding independent claim 9 of U.S. patent 10943596 (original 
The present application, Claim 8: 
8. An audio processing device, comprising: a communicator that acquires audio that is picked up by a microphone in a sound pick up area via a network; a processor that processes the audio that is picked up from the microphone; and a speaker that outputs the audio processed by the processor, wherein the processor is configured to determine whether or not the audio is a speech uttered by a person by analyzing a frequency of the audio; detect a position of a sound source of the speech; determine whether or not the position of the sound source of the speech is within a specific region; and convert the speech into a substitute sound and output the substitute sound to the speaker when the position of the sound source of the speech is within Application No. 17/168,450Attorney Docket No. P62459the specific region, and output the speech to the speaker without converting the speech into the substitute sound when the position of the sound source of the speech is outside the specific region.
Patent #10943596, Claim 9: 
9. An audio privacy processing device, comprising: a microphone array device that acquires audio from a person in a designated audio pick- up area; a signal processor that receives the acquired audio over a network, and determines when an audio position of the person is within a privacy protection area in the designated audio pick-up area; an audio analyzer that analyzes speech audio of the person in the privacy protection area and determines an emotion of the person based on the analyzed speech audio by accessing a privacy protection sound database that includes emotion value tables, and that converts the determined emotion of the person into a designated substitute sound having a designated frequency from a plurality of predetermined substitute sounds and predetermined designated frequencies; and Application No. 16/074,311Attorney Docket No. P55030 an output controller that 
Conclusion
6.	 Any inquiry concerning this communication or earlier communications from the examiner should be directed to FENG-TZER TZENG whose telephone number is (571)272-4609. The examiner can normally be reached on M-F (8:00-5:30). The fax phone number where this application or proceeding is assigned is 571-273-4609.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir (SPE) can be reached on (571)272-7799.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/FENG-TZER TZENG/	2/22/2022  

Primary Examiner, Art Unit 2659