DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
2.	Applicant’s arguments with respect to claim(s) 1-11 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
3.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
4.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

5.	Claim(s) 1-5 and 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over Visser  et al. Pub. No. US  2013/0272548 A1 (Hereinafter “Vesser”) in view of Kim et al. Pub. No. US 2014/0314391 A1 (Hereinafter “Kim”), further in view of Wang et al. Pub. No. US 2016/0249134 A1 (Hereinafter “Wang”).
Regarding Claim 1, Visser discloses a signal processing device (see fig.1 and scene analysis system 12) comprising: 
an input interface that receives signals of a video and an audio acquired concurrently in a space where subjects exist (see paragraph [0089]: microphone array 18 records audio signals at the scene. one or more video cameras 510 records video signals at the scene. Emphases added: “a space where subjects exist” is interpreted as object location for each object recognized in the scene, as discussed in paragraph [0100]); 
an image processor that recognizes subject images included in the video (see paragraph [0089]:The image processor 504 receives and records one or more images of the scene from one or more cameras 508 taking pictures of the scene, or The video processor 506 receives and records video signals from one or more video cameras 510 recording the scene), to determine a first type of area where each subject exists (see paragraph [0203]: a module or device may be configured to produce the estimated source directions based on image information from a camera (e.g., by performing face and/or motion detection) and/or ranging information from ultrasound reflections. Such a module or device may also be configured to estimate the number of sources and/or to track one or more sources in motion.); 
an audio processor that recognizes sound sources included in the audio, to determine a second type of area where each sound source exists in the space (see paragraph [0090]: The audio processor 502 includes a direction of arrival (DOA) detector 512, a region of interest (ROI) selector 514, a sound classifier 516, and an acoustic feature extractor 518. From the microphone signals received from the microphone array 18, the DOA detector 512 determines the direction of arrival of sound emanating from a sound source located within the scene. Exemplary components and functions of the DOA detector 512 are described herein in connection with FIGS. 18-25. From the DOA and position of the array, an estimate of the location of the sound source in the scene can be determined, and in addition see paragraph [0073]: In step 204, one or more of the sound producing objects in the scene 10 is identified and located by the object recognition subsystem 20.); and 
Visser fails to explicitly disclose:
wherein the image processor determines the first type of area where each subject exists independently of the second type of area where each sound source exists determined by the audio processor, and the audio processor determines the second type of area where each sound source exists independently of the first type of area where each subject exists determined by the image processor;
a controller that uses the first type of area and the second type of area to judge coincidence or non-coincidence between a position of the each subject and a position of the each sound source, to determine a combination of a subject and a sound source whose positions coincide with each other, the controller selectively determining the subject image to be output that corresponds to the combination.  
In analogous art, Kim teaches:
a controller that uses the first type of area and the second type of area to judge coincidence or non-coincidence between a position of the each subject and a position of the each sound source, to determine a combination of a subject and a sound source whose positions coincide with each other, the controller selectively determining the subject image to be output that corresponds to the combination (see paragraphs [0165]-[0166]:…  the electronic device maps an image object and an audio object whose characteristics correspond to each other. In another example, the electronic device can determine the corresponding relationship on the basis of a movement of a target. The electronic device recognizes the movement of the target within the image by using a preview screen of image-taking or a video track of a video, and maps an audio object generated when the movement exists to an image object corresponding to the target. The electronic device can also determine the corresponding relationship on the basis of the directivity of the audio. The electronic device maps an image object and an audio object whose directions are consistent with each other…. The electronic device proceeds to step 1709 and encodes a combination data set, which includes image data, audio data, and mapping data.).  

Therefore, it would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to modify the signal processing device Visser with the teaching as taught by Kim in order to  combine image data such as a still image and a video with audio data such as a voice, a background sound, and music to play a combination of the image data and the audio data, thereby electronic device can simultaneously or sequentially output a plurality of audio objects mapped with the plurality of image objects. 
Visser in view of Kim fail to disclose:
wherein the image processor determines the first type of area where each subject exists independently of the second type of area where each sound source exists determined by the audio processor, and the audio processor determines the second type of area where each sound source exists independently of the first type of area where each subject exists determined by the image processor;
in analogous art, Wang teaches:
wherein the image processor determines the first type of area where each subject exists independently of the second type of area where each sound source exists determined by the audio processor, and the audio processor determines the second type of area where each sound source exists independently of the first type of area where each subject exists determined by the image processor (see paragraphs [0020]-[0021]: Sources of sound to which a user may want to aim the beam of the microphone array are referred to herein as beam forming targets. For example, a person speaking may be selected as beam forming target, and the beam of the microphone array 114 may be pointed at the person speaking……);
	Therefore, it would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to modify the signal processing device Visser in view of Kim with the teaching as taught by Wang in order to provide 
techniques for processing an audio signal to reduce background noise through beam forming. 

Regarding Claim 2, Visser in view of Kim and Wang disclose the signal processing device as discussed in the rejection of claim 1. Kim further discloses wherein when combination of the subject image and the sound source is one combination, the controller selectively determines the subject image to be output that corresponds to the one combination (see paragraph [0166]).  
Regarding Claim 3, Visser in view of Kim and Wang disclose the signal processing device as discussed in the rejection of claim 1. Kim further discloses wherein when combination of the subject image and the sound source includes a plurality of combinations and the plurality of combinations are not combinations of all subjects and all sound sources, the controller outputs the subject image corresponding to each of the plurality of combinations in a predetermined order of priority (see include, but are not limited to paragraphs [0075] and [0084]).  

Regarding Claim 4, Visser in view of Kim and Wang disclose the signal processing device as discussed in the rejection of claim 1. Kim further discloses wherein the image processor and the audio processor follow changes in the video and the audio, respectively, to update the first type of area where the each subject exists and the second type of area where the each sound source exists, and wherein the controller uses the updated first and second types of areas to update the combination of the subject image and the sound source in which the subject and the sound source coincide in position, and selectively outputs the subject image that corresponds to the updated combination (see paragraph [0166]). 
 
Regarding Claim 5, Visser in view of Kim and Wang disclose the signal processing device as discussed in the rejection of claim 1. Kim further discloses wherein when combination of the subject image and the sound source includes a plurality of combinations and the plurality of combinations are combinations of all subjects and all sound sources, the controller outputs all subject images corresponding to the all subjects (see paragraph [0125]).  

Regarding Claim 9, Visser in view of Kim and Wang disclose the signal processing device as discussed in the rejection of claim 1. Kim further discloses an output interface that outputs a signal of the subject image contained in the combination selectively determined  (see paragraph [0166]).

	Regarding Claim 10, Visser in view of Kim and Wang disclose the signal processing device as discussed in the rejection of claim 1. Kim further discloses wherein 47the controller selectively determines the subject image to be output depending on the number of the combinations  (see paragraph [0166]).   

6.	Claim(s) 6-8 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Visser  et al. Pub. No. US  2013/0272548 A1 (Hereinafter “Vesser”) in view of Kim et al. Pub. No. US 2014/0314391 A1 (Hereinafter “Kim”) in view of Wang et al. Pub. No. US 2016/0249134 A1 (Hereinafter “Wang”) further in view of Nicol et al. Pub. No. US 2019/0215540 A1 (Hereinafter “Nicol”).
Regarding Claim 6, Visser in view of Kim and Wang disclose the signal processing device as discussed in the rejection of claim 1. 
Visser in view of Kim and Wang fail to disclose:
wherein the controller issues an instruction to cut out the subject image contained in the combination from the video.  
In analogous art, Nicol discloses:
wherein the controller issues an instruction to cut out the subject image contained in the combination from the video (see paragraphs [0261] and [0309]).  
Therefore, it would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to modify the signal processing device Visser in view of Kim and Wang with the teaching as taught by Nicol in order to allow the end user to control various aspects of the audio mixing and video editing, thereby enhancing the end user experience.

Regarding Claim 7, Visser in view of Kim, Wang and Nicol disclose the signal processing device as discussed in the rejection of claim 6. Nicol further discloses wherein when the subject image contained in the combination is an image of a person and the sound source contained therein is the person, the controller cuts out the image of the person from the video, and wherein when the subject image contained in the combination is a composite image of a person and a musical instrument and when the sound source is the musical instrument, the controller cuts out the composite image from the video (see paragraphs [0261] and [0309]).  

Regarding Claim 8, Visser in view of Kim, Wang and Nicol disclose the signal processing device as discussed in the rejection of claim 6. Nicol further discloses wherein when combination of the subject image and the sound source includes two combinations and a distance between areas including two subject images contained in the two combinations is within a predefined range, the controller cuts out the two subject images (see fig.19 B, paragraphs [0253] and [0261]).   

Regarding Claim 11, Visser in view of Kim and Wang disclose the signal processing device as discussed in the rejection of claim 1.Visser further discloses a signal processing system comprising: a plurality of imaging devices, each of which images the space to generate a signal of the video ( see paragraph [0067]: one or more cameras 16. The cameras 16 may include one or more still image cameras and/or one or more video cameras arranged at various locations and angles relative to the scene 10.);
Visser in view of Kim and Wang fail to disclose:
a switcher that selects a video including the selectively determined subject image, from among the videos output from the signal processing device and the plurality of imaging devices, wherein the signal processing device comprises an imager that images a plurality of subjects lying in the space to create a video including a plurality of subject images, each of the plurality of imaging devices images an individual subject to create a video including an individual subject image, and the controller of the signal processing device controls the switcher to selectively output the subject image contained in the combination.
In analogous art, Nicol discloses:
a switcher that selects a video including the selectively determined subject image, from among the videos output from the signal processing device and the plurality of imaging devices, wherein the signal processing device comprises an imager that images a plurality of subjects lying in the space to create a video including a plurality of subject images, each of the plurality of imaging devices images an individual subject to create a video including an individual subject image, and the controller of the signal processing device controls the switcher to selectively output the subject image contained in the combination (see paragraph [0011]).
Therefore, it would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to modify the signal processing system Visser in view of Kim and Wang with the teaching as taught by Nicol in order to allow the end user to control various aspects of the audio mixing and video editing, thereby enhancing the end user experience.

Conclusion
7.	THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time 

policy as set forth in 37 CFR 1.136(a).  

A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Alazar Tilahun whose telephone number is (571)270-5712. The examiner can normally be reached Monday -Friday, From 9:00 AM-6:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jefferey Harold can be reached on (571) 272-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ALAZAR TILAHUN/
Primary Examiner
Art Unit 2424



/ALAZAR TILAHUN/Primary Examiner, Art Unit 2424