DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

	Response to Amendment
1. This office action is in response to communications filed 3/8/2022 Claims 1, 9, 13, 18-19 are amended. Claims 2, 3, 4, and 17 are previously presented. Claims 5-8, 10-12, and 14-16 are canceled.

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-4, 9, 13, 17-19 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 6/1/2022 was filed after the mailing date of the amended claims on 3/8/2022.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

1. 	Claims 1-4, 9, 13, and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application 2016/0098622, Ramachandrula in view of U.S. Patent 2009/0154896, Matono further in view of U.S. Patent Application 2015/0003648, Kang.

2. 	Regarding Claim 1, Ramachandrula discloses A data processing apparatus (Fig. 1; [0010], System 100 includes computing device 102 and camera 104; fig. 2: computer system 202; [0013]) comprising:
 	a processor (Fig. 2: processor 204; [0015]),
 	wherein the processor performs processing of ([0016], processing logic that interprets and executes instructions);
 	acquiring image data at a time of image capturing (Fig. 5; Fig. 3: capture audiovisual of a user 302; [0024]), the image data including images of plurality of subjects ([0020], different lip vector quantization (VQ) indices) of a plurality of users);
 	acquiring plural pieces of audio data respectively emitted by the plurality of subjects at the time of image capturing (Figs. 4 and 5. Examiner notes that applicant’s definition of “a situation” includes noise (i.e. speaking). [0011], Computing device 102 includes microphone 108 for capturing an audio input of a user);
 	associating the acquired plural pieces of audio data with the respective positions of the plurality of subjects identified as the audio sources (Figs. 1, 4, 5; [0011], An audio-video input may include user 106 speaking in the vicinity of camera 104 (for instance, while facing camera 104) for some duration. Examiner notes that the camera can used to capture the lip movements associated with the user speaking into the system 100. Examiner equates the lip movements to the position of the subject identified as the audio source is gathered when the user speaks into the camera/mic. Examiner notes that the code words and lip vector images are associated with the user; Fig. 1, 4, 5; [0011], [0030], Computing device 102 includes microphone 108 for capturing an audio input of a user. In an implementation, user 106 provides an audio-video (audiovisual) input (for example, for the purpose of authentication) to computing device 102. An audio-video input may include user 106 speaking in the vicinity of camera 104 (for instance, while facing camera 104) for some duration); and
 	the associated plural pieces of audio data, in synchronization with display of the image data ((Figs. 1, 4, 5; [0011], An audio-video input may include user 106 speaking in the vicinity of camera 104 (for instance, while facing camera 104) for some duration. Examiner notes that the camera can used to capture the lip movements associated with the user speaking into the system 100.)
 	However, Ramachandrula does not explicitly disclose analyzing the image data so as to identify a position of a subject, the subject being included in the image data and being an audio source in the image data;
 	displaying the image data on a display section, and outputting, from a plurality of audio output
sections arranged so as to be separated from each other, 
 	determining sound volumes of the plural pieces of audio data outputted from the plurality of audio output sections based on the identified positions of the plurality of subjects in the images of the image data; and
 	mixing the plural pieces of audio data at the determined sound volumes, and outputting from the plurality of audio output sections.
 	Matono teaches analyzing the image data so as to identify respective positions of the plurality of subjects, the plurality of subjects being audio sources in the image data (Fig. 3; [0016], extracting a sound of the specific subject from an audio signal and adjusting the extracted sound on the basis of the detected location. [0031], FIG. 3, the location of the speaker is output as "+P." As for a method for identifying the speaker location.);
 	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the computer system taught in Ramachandrula with a speaker detection representing location on the screen a person is recorded as taught in Matono for the purposes of improving user authentication from a predetermined distance.
 	Ramachandrula in view of Matono does not explicitly disclose displaying the image data on a display section, and outputting, from a plurality of audio output sections arranged so as to be separated from each other, 
 	determining sound volumes of the plural pieces of audio data outputted from the plurality of audio output sections based on the identified positions of the plurality of subjects in the images of the image data; and
 	mixing the plural pieces of audio data at the determined sound volumes, and outputting from the plurality of audio output sections.
 	Kang teaches displaying the image data on a display section (Fig 2, 3, 5; [0088], displayer 150 displays image 70 80 shown in fig. 5), and outputting (Fig 5: [0089], speakers 10 to 40), from a plurality of audio output sections (Fig. 5: speakers 10 to 40) arranged so as to be separated from each other (Figs. 5-9; [0025], the plurality of speakers may be disposed in each edge (i.e. separated from each other). [0090], speakers 10 to 40 dispersedly disposed in the edge areas of the display apparatus), 
 	determining sound volumes of the plural pieces of audio data outputted from the plurality of audio output sections based on the identified positions of the plurality of subjects in the images of the image data (Figs 5-9; [0088], when an automatic screen image 70 and a train screen image 80 are detected, the controller 120 obtains image information related to the predetected automobile 70 and predetected train screen image 80 of the image information prestored in the storage unit 150. Next, the controller 120 determines the vocalized position area of the automobile 70 and train screen image 80 based on the vocalized position information included in each of the image information obtained. [121], a stereophonic sound service for a moved image in a display apparatus); and
 	mixing the plural pieces of audio data at the determined sound volumes, and outputting from the plurality of audio output sections (Fig. 5; [0078], the audio processor 140 may adjust and re-combine the plurality of sub audio signals based on the distances between the predetected vocalized location and the plurality of speakers 170 and create a plurality of audio output signals).
 	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the computer system taught in Ramachandrula with a speaker detection representing location on the screen a person is recorded as taught in Matono for the purposes of improving user authentication from a predetermined distance to further incorporate a plurality of speakers separated from one another to output a sound of a sound source so as to be focused on an area as taught in Kang for the purposes enhancing user listening experience. 

3. 	Regarding Claim 2, Ramachandrula discloses The data processing apparatus according to claim 1, wherein the processor analyzes the plural pieces of audio data so as to acquire audio features thereof ([0008], analyze human body characteristics, see Fig. 3: 306 evaluate speech vector quantization corresponding to lip vector sequence (e.g. frames) with code words (e.g. speech); Fig 4: speech signals 404, 406 associated with the user’s lip images in Fig 5, [0033]), and analyzes the image data based on the audio features so as to identify respective positions of the plurality of subjects, the plurality of subjects being included in the image data, being the audio source in the image data and having the audio features (Fig. 4; [0033], Fig 4, a speech signal 404 is extracted from an audiovisual 402 of an individual 400. Next speech vectors 406 are extracted from speech signal 404. Later, vector quantization is used to compress speech feature vectors 406 by representing a vector with a codevector. Subsequently, an index 408 identifying the codevector is used to represent the original speech signal).

4. 	Regarding Claim 3, Ramachandrula discloses The data processing apparatus according to claim 1, wherein the processor analyzes the plurality of subjects (Examiner notes that the code words and lip vector images are associated with the user; Fig. 1; [0011], Computing device 102 includes microphone 108 for capturing an audio input of a user. In an implementation, user 106 provides an audio-video (audiovisual) input (for example, for the purpose of authentication) to computing device 102. An audio-video input may include user 106 speaking in the vicinity of camera 104). 

5. 	Regarding Claim 4, Ramachandrula discloses The data processing apparatus according to claim 3, wherein the processor identifies the one subject that is the audio source by analyzing motions of the plurality of subjects in the image data (Fig. 5, [0007], FIG. 5 illustrates lip shapes of different individuals, see Fig. 3: 306 evaluate speech vector quantization corresponding to lip vector sequence (e.g. frames) with code words (e.g. speech).

6. 	Regarding Claim 9, Ramachandrula discloses The data processing apparatus according to claim 1, wherein the processor analyzes the plural pieces of audio data so as to extract pieces of audio data for respective ones of the audio sources ([0033], A universal codebook for speech signals is created by extracting, from a speech signal repository of a plurality of speakers, feature vectors on each sequential frame of speech), and
 	wherein the processor selects a specific piece of audio data corresponding to the position of the plurality of subjects identified as the audio source from the extracted pieces of audio data for the respective audio sources and controls such that a specific piece of audio data and the one subject are associated with each other ([0049], The individuals may be asked to read out some text, wherein the selected text may include all the phonemes in sufficient numbers so that all possible lip shapes which an individual can make could be covered).

7. 	Regarding Claim 13, Ramachandrula in view of Matono discloses The data processing apparatus according to claim 11, 
 	Ramachandrula discloses wherein the processor controls the output status of the specific piece of audio data outputted from the audio output sections (see Fig. 3: 306, code words (e.g. speech)) so as to follow based-on a movement of the position of the one subject identified as the audio source (Examiner notes that the camera 104 shown in fig. 1 can be switched to different locations or the users can move to different positions to capture their lip movements and voice input).

8. 	Regarding Claim 17, Ramachandrula discloses The data processing apparatus according to claim 3, wherein the processor controls such that the position of the one subject identified as the audio source and the specific piece of audio data corresponding to the one subject are associated with each other (Examiner notes that the code words and lip vector images are associated with the user; Fig. 1; [0011], Computing device 102 Application/Control Number: 16/442,217 Page 11 Art Unit: 2422 includes microphone 108 for capturing an audio input of a user. In an implementation, user 106 provides an audio-video (audiovisual) input (for example, for the purpose of authentication) to computing device 102. An audio-video input may include user 106 speaking in the vicinity of camera 104), and creates a file for managing image data including the one subject and the specific piece of audio data corresponding to the subject (Fig. 2: 206 memory; see [0018]).

9. 	Claim 18 is a method claim, rejected with respect to the same limitation rejection in the apparatus claim 1. 

10. 	Claim 19 is a CRM claim, rejected with respect to the same limitation rejection in the apparatus claim 1.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to OMER KHALID whose telephone number is (571)270-5997. The examiner can normally be reached Monday- Friday 9am-7pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, John Miller can be reached on (571) 272-7353. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/OMER KHALID/Examiner, Art Unit 2422                                                                                                                                                                                                        
/JOHN W MILLER/Supervisory Patent Examiner, Art Unit 2422