DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt of foreign priority is acknowledged of certified copies of papers required by 37 CFR 1.55.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 6/14/2019 and 12/15/2020 was filed on and after the mailing date of the claims on 6/14/2019.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

The following is a quotation of pre-AIA  35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA  35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

Claims 4 and 5 are rejected under 35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which Claim 5 is dependent on claim 3 which is dependent on claim 1. Claim 5 recites the same limitation recited in dependent claim 4, which is dependent on claim 1. Therefore, the claim limitations do not constitute furthering the limitation.  Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claim 16 is rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. The claimed subject matter “wide-range sound collection” isn’t properly covered in the specification. The specification recites “a microphone which is provided on the wide-angle lens side and collects surrounding sound in synchronization with image capturing.” It is unclear if applicant is regarding a 360-degree sound 

Claim Objections
Claims 14 and 15 are objected to because of the following informalities: Claims 14 and 15 recites “other audio data”, examiner is unclear if “other audio data” is different from “audio data” and “specific audio data”.  Appropriate correction is required.

Claim 16 is objected to because of the following informalities: Claim 16 recites “wherein the image data is image data acquired by wide-angle image capturing” should be changed to “wherein the image data is image data acquired by wide-angle image capturing device.”  Appropriate correction is required.

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  



Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


1. 	Claim(s) 1-10, 17, 18 and 19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by U.S. Patent Application 2016/0098622, Ramachandrula et al. (hereinafter Ramachandrula)

2. 	Regarding Claim 1, Ramachandrula discloses A data processing apparatus (Fig. 1; [0010], System 100 includes computing device 102 and camera 104; fig. 2: computer system 202; [0013]) comprising; 
 	a processor (Fig. 2: processor 204; [0015]),
 	wherein the processor performs processing of ([0016], processing logic that interprets and executes instructions); 
 	acquiring image data (Fig. 5; Fig. 3: capture audiovisual of a user 302; [0024]);
 	acquiring audio data (Fig. 4; Fig. 3: capture audiovisual of a user 302; [0024]);
 	analyzing the image data ([0008], analyze human body characteristics, see Fig. 3: 306 evaluate speech vector quantization corresponding to lip vector sequence (e.g. frames) with code words (e.g. speech)) so as to identify a subject (e.g. user’s mouth when speaking) that is an audio source in the image data (Fig. 1; [0033], an index 408 identifying the codevector is used to represent the original speech signal (e.g. of the user));
 	extracting specific audio data (see Fig. 3: 306, code words (e.g. speech)) corresponding to the subject identified as the audio source from the acquired audio data (Fig. 3: 306 authentication weighted by probabilities of speech and lip vector indices of the user; [0033], a speech signal 404 is extracted from an audiovisual 402 of an individual 400. Next, speech feature vectors 406 are extracted from speech signal 404); and
 	controlling such that the specific audio data and the subject are associated with each other (Examiner notes that the code words and lip vector images are associated with the user; Fig. 1; [0011], Computing device 102 includes microphone 108 for capturing an audio input of a user. In an implementation, user 106 provides an audio-video (audiovisual) input (for example, for the purpose of authentication) to computing device 102. An audio-video input may include user 106 speaking in the vicinity of camera 104 (for instance, while facing camera 104) for some duration) .

3. 	Regarding Claim 2, Ramachandrula discloses The data processing apparatus according to claim 1, wherein the processor analyzes the audio data so as to acquire audio features ([0008], analyze human body characteristics, see Fig. 3: 306 evaluate speech vector quantization corresponding to lip vector sequence (e.g. frames) with code words (e.g. speech); Fig. 4: speech signals 404, 406 associated with the users lip images in Fig. 5, [0033]), and analyzes the image data based on the audio features so as to identify the subject that is the audio source in the image data and having the audio features (Fig. 4; [0033], FIG. 4, a speech signal 404 is extracted from an audiovisual 402 of an individual 400. Next, speech feature vectors 406 are extracted from speech signal 404. Later, vector quantization is used to compress speech feature vectors 406 by representing a vector with a codevector. Subsequently, an index 408 identifying the codevector is used to represent the original speech signal).

4. 	Regarding Claim 3, Ramachandrula discloses The data processing apparatus according to claim 1, wherein the processor identifies the subject that is the audio source by analyzing subjects in the image data (Fig. 1; [0011], Computing device 102 includes microphone 108 for capturing an audio input of a user), and analyzes the audio data based on appearance features of the subject identified as the audio source so as to extract the specific audio data corresponding to the subject having the appearance features from the audio data and control such that the specific audio data and the subject are associated with each other (Examiner notes that the code words and lip vector images are associated with the user; Fig. 1; [0011], Computing device 102 includes microphone 108 for capturing an audio input of a user. In an implementation, user 106 provides an audio-video (audiovisual) input (for example, for the purpose of authentication) to computing device 102. An audio-video input may include user 106 speaking in the vicinity of camera 104).

5. 	Regarding Claim 4, Ramachandrula discloses The data processing apparatus according to claim 1, wherein the processor identifies the subject that is the audio source by analyzing motions of subjects in the image data (Fig. 5, [0007], FIG. 5 illustrates lip shapes of different individuals, see Fig. 3: 306 evaluate speech vector quantization corresponding to lip vector sequence (e.g. frames) with code words (e.g. speech).

6. 	Regarding Claim 5, Ramachandrula discloses The data processing apparatus according to claim 3, wherein the processor identifies the subject that is the audio source by analyzing motions of the subjects in the image data (Fig. 5, [0007], FIG. 5 illustrates lip shapes of different individuals, see Fig. 3: 306 evaluate speech vector quantization corresponding to lip vector sequence (e.g. frames, examiner equates the movement of the individuals lips in each frame as motion) with code words (e.g. speech)..

7. 	Regarding Claim 6, Ramachandrula discloses The data processing apparatus according to claim 1, further comprising;
(Fig. 1: 102; [0011], Computing device 102 may be a desktop computer, notebook computer, tablet computer, mobile phone, personal digital assistant (PDA), smart phone, server computer, and the like. Fig. 2: display device 210 and Fig. 5: lip images),
 	 wherein the processor displays (Fig. 3: 304 process the audio visual), on the display section (Fig. 2: 220), image data (Fig. 5: lip images) including the subject identified as the audio source (sound from individuals), and controlling such that the extracted specific audio data and the displayed subject are associated with each other (Examiner notes that the code words and lip vector images are associated with the user; Fig. 1; [0011], Computing device 102 includes microphone 108 for capturing an audio input of a user; [0033], a speech signal 404 is extracted from an audiovisual 402 of an individual 400. Next, speech feature vectors 406 are extracted from speech signal 404).

8. 	Regarding Claim 7, Ramachandrula discloses The data processing apparatus according to claim 6, wherein the processor clips an area including the subject identified as the audio source from the acquired image data ([0037], FIG. 5 illustrates lip shapes of three different individuals. Each row in FIG. 5 corresponds to different frames of a common video recorded for each individual. In other words, these frames represent lip shapes of individuals when they are mouthing or speaking same words.), and
 	wherein the processor (i) displays (Fig. 3: 304 process the audio visual), on the display section (Fig. 2: display device), the clipped image acquired by the area being clipped ([0037], FIG. 5 illustrates lip shapes of three different individuals, according to an example. Each row in FIG. 5 corresponds to different frames of a common video recorded for each individual. In other words, these frames represent lip shapes of individuals when they are mouthing or speaking same words.), 
([0037], Lip shapes may be extracted and their features measured by using Active shape models (ASM) of individuals when they are mouthing or speaking same words), and 
 	(iii) controls such that the specific audio data and the subject are associated with each other ([0037], Lip shapes may be extracted and their features measured by using Active shape models (ASM) of individuals when they are mouthing or speaking same words. [0040], A video database of a plurality of individuals (speaking while facing the camera) is created to train the universal codebook for speech signals and universal codebook for lip shapes).

9. 	Regarding Claim 8, Ramachandrula discloses The data processing apparatus according to claim 6, wherein the processor clips an area including the subject arbitrarily specified as the audio source from the acquired image data (Examiner notes that the camera 104 shown in fig. 1 can be switched to different users to capture their lip movements and voice input), and wherein the processor (i) displays (Fig. 3: 304 process the audio visual), on the display section (Fig. 2: 220; fig. 5), the clipped image acquired by the area being clipped ([0037], FIG. 5 illustrates lip shapes of three different individuals, according to an example. Each row in FIG. 5 corresponds to different frames of a common video recorded for each individual. In other words, these frames represent lip shapes of individuals when they are mouthing or speaking same words.),	
 	(ii) extracts, from the audio data, the specific audio data corresponding to the subject included in the clipped image as the audio source (Figs. 4 and 5; [0033], A universal codebook for speech signals is created by extracting, from a speech signal repository of a plurality of speakers, feature vectors on each sequential frame of speech), and 
([0037], Lip shapes may be extracted and their features measured by using Active shape models (ASM) of individuals when they are mouthing or speaking same words. [0040], A video database of a plurality of individuals (speaking while facing the camera) is created to train the universal codebook for speech signals and universal codebook for lip shapes).

10. 	Regarding Claim 9, Ramachandrula discloses The data processing apparatus according to claim 1, wherein the processor analyzes the audio data so as to extract audio data of each audio source ([0033], A universal codebook for speech signals is created by extracting, from a speech signal repository of a plurality of speakers, feature vectors on each sequential frame of speech), and
 	wherein the processor selects the specific audio data corresponding to the subject identified as the audio source from the extracted audio data of each audio source and controls such that the specific audio data and the subject are associated with each other ([0049], The individuals may be asked to read out some text, wherein the selected text may include all the phonemes in sufficient numbers so that all possible lip shapes which an individual can make could be covered. 

11. 	Regarding Claim 10, Ramachandrula discloses The data processing apparatus according to claim 6, wherein the processor identifies the subject that is the audio source (speaker’s voice) in the displayed image data (fig. 5) by analyzing the image data displayed on the display section (Figs. 1-5) .

12. 	Regarding Claim 17, Ramachandrula discloses The data processing apparatus according to claim 1, wherein the processor controls such that the subject identified as the audio source and the specific audio data corresponding to the subject are associated with each other (Examiner notes that the code words and lip vector images are associated with the user; Fig. 1; [0011], Computing device 102 includes microphone 108 for capturing an audio input of a user. In an implementation, user 106 provides an audio-video (audiovisual) input (for example, for the purpose of authentication) to computing device 102. An audio-video input may include user 106 speaking in the vicinity of camera 104), and creates a file for managing image data including the subject and the specific audio data corresponding to the subject (Fig. 2: 206 memory; see [0018]).

13. 	Claim 18 is a method claim, rejected with respect to the same limitation rejection in the apparatus claim 1.

14. 	Claim 19 is a CRM claim, rejected with respect to the same limitation rejection in the apparatus claim 1.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

15. 	Claims 11, 12, 13, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application 2016/0098622, Ramachandrula in view of U.S. Patent 2009/0154896, Matono.

16. 	Regarding Claim 11, Ramachandrula discloses The data processing apparatus according to claim 1 (Fig. 1; [0010], System 100 includes computing device 102 and camera 104; fig. 2: computer system 202; [0013]), further comprising:
(Fig. 1: user speaking into microphone 108, examiner equates the output to the user’s voice) which outputs the specific audio data extracted from the audio data ([0033], A universal codebook for speech signals is created by extracting, from a speech signal repository of a plurality of speakers, feature vectors on each sequential frame of speech),
However, Ramachandrula does not explicitly disclose wherein the processor controls an output status of the specific audio data outputted from the audio output section in accordance with a position of the subject identified as the audio source in image of the image data.
Matono discloses wherein the processor controls an output status of the specific audio data outputted from the audio output section in accordance with a position of the subject identified as the audio source in image of the image data (Fig. 7: microphone 706, sound recognition processor and object detector 703; [0055]-[0056], Fig. 3: location of speaker see [0031])
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the computer system taught in Ramachandrula with a speaker detection representing location on the screen a person is recorded as taught in Matono for the purposes of improving user authentication from a predetermined distance.

17. 	Regarding Claim 12, Ramachandrula in view of Matono discloses The data processing apparatus according to claim 11, 
 	Ramachandrula discloses wherein the audio output section includes a plurality of loudspeakers arranged at different positions ([0040], A video database of a plurality of individuals (speaking while facing the camera) is created to train the universal codebook for speech signals and universal codebook for lip shapes), and
However, Ramachandrula does not explicitly disclose wherein the processor controls, for each loud speaker, a sound volume of the specific audio data outputted from the audio output section in accordance with the position of the subject identified as the audio source
 Matono discloses wherein the processor controls, for each loud speaker, a sound volume of the specific audio data outputted from the audio output section in accordance with the position of the subject identified as the audio source (Fig. 7: microphone 706, sound recognition processor and object detector 703; [0055]-[0056], Fig. 3: location of speaker see [0031])
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the computer system taught in Ramachandrula with a speaker detection representing location on the screen a person is recorded as taught in Matono for the purposes of improving user authentication from a predetermined distance.

18. 	Regarding Claim 13, Ramachandrula in view of Matono discloses The data processing apparatus according to claim 11, 
 	Ramachandrula discloses wherein the processor controls the output status of the specific audio data outputted from the audio output section (see Fig. 3: 306, code words (e.g. speech)) based on a movement of the position of the subject identified as the audio source (Examiner notes that the camera 104 shown in fig. 1 can be switched to different locations or the users can move to different positions to capture their lip movements and voice input).

19. 	Regarding Claim 15, Ramachandrula in view of Matono discloses The data processing apparatus according to claim 11, 
Ramachandrula discloses wherein the processor combines the specific audio data with other audio data and controls the audio output section to output resultant data (Examiner notes that when user is speaking into the mic background noise will also be output into the system).

20. 	Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Ramachandrula in view of Maton as applied to claim 11 above, and further in view of U.S. Patent Application 2016/0313973, Yajima.

21. 	Regarding Claim 14, Ramachandrula in view of Matono discloses The data processing apparatus according to claim 11, 
 	Ramachandrula discloses wherein the processor controls the audio output section (e.g user speaking) to output only the specific audio data corresponding to the subject identified as the audio source (see Fig. 3: 306, code words (e.g. speech)), and prevents output of other audio data.
 	Ramachandrula in view of Matono does not explicitly disclose prevents output of other audio data.
 	Further, Yajima teaches prevents output of other audio data (Fig. 5: s37, Filtering out background noise from voice, [0148]).
 	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the microphone as taught in Ramachandrula in view of Matono with a control section filtering out background noise as taught in Yajima for the purposes of improving the audibility of the target sound (Yajima, [0148]).
 	


s 16 rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application 2016/0098622, Ramachandrula in view of U.S. Patent Application 2016/0313973, Yajima.

23. 	Regarding Claim 16, Ramachandrula discloses The data processing apparatus (Fig. 1; [0010], System 100 includes computing device 102 and camera 104; fig. 2: computer system 202; [0013]) according to claim 1, wherein the image data is image data acquired by wide-angle image capturing, and
 	wherein the audio data is audio data acquired by wide-range sound collection (microphone 108) covering the wide angle being performed in synchronization with the wide-angle image capturing.
 	However, Ramachandrula does not explicitly disclose a wide-angle image capturing
 	Yajima discloses a wide-angle image capturing ([0083], wide-angle lenses as image pickup lenses and may be capable of picking up images in a wide angle of view)
 	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify camera taught in Ramachandrula with a wide-angle camera as taught in Yajima for the purposes of improving the field of view of the desired capturing region.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to OMER KHALID whose telephone number is (571)270-5997.  The examiner can normally be reached on Monday- Friday 9am-7pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/OMER KHALID/Examiner, Art Unit 2422                                                                                                                                                                                                        
/JOHN W MILLER/Supervisory Patent Examiner, Art Unit 2422