Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/16/2020 is being considered by the examiner.
Drawings
The drawing submitted on 08/17/2020 is being considered by the examiner.
EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in an interview with R. Brian Johnson on 3/1/2021 to correct the claims 12-13 and 16-17 dependency on canceled claims 10 and 15. Subsequently Claims 12-13 was amended for corrected dependency on claim 9 and claims 16-17 was cancelled based on the original claims presented on 12/17/2020. 
The application has been amended as follows: 
12. (Currently Amended) A system according to Claim 9, wherein said virtual positions of the mouth validate that the mouth is moving during said audio data stream of audio samples and/or not moving during an interim period.  
13. (Currently Amended) A system according to  Claim 9, wherein said command processing software identifies a plurality of frames of pixel data representing 
16. (Canceled)  
17. (Canceled)  
Allowable Subject Matter
Claims 1, 7-9, 11-13, 19, 22, 24, and 33-40 are allowed.
The following is an examiner’s statement of reasons for allowance: The prior art of record Coon et al. (US 2009/055180 A1) teach: [0042] The speaker identification routine is employed to determine what individual is in what location in the vehicle. If a visual occupant detection system is employed in the vehicle, then user locations may be identified via face recognition software.
The prior art of record Wang et al.(US 2014/0214424 A1) teach: [0056] For example, with a robust face detection module 304, IVI system 100 could automatically tell the identity of who is speaking then perform personalized settings IVI system 100. In some examples, when a face is detected and recognized, control system 308 might be adapted to adjust control settings based at least in part on the identity of the recognized occupant. Additionally or alternatively, when a face is detected and recognized, control system 308 might adapt any response to a command to adjust the response based at least in part on the identity of the recognized occupant. Additionally, a determination of who is speaking of operation 322 may be communicated to control system 308. In such an example, when a face is detected and recognized and determination is made that this individual is speaking, control system 308 might be adapted to adjust control settings and/or adjust a response to occupant commands based at least in part on the identity of the recognized occupant.
The prior art of record Lopes et al.(US 2010/0194863 A1) teach: [0058] As mentioned above, system 100 may generate or access one or more formulas and/or look-up tables for determining a 3D physical location of a participant 130 based on the location (e.g., pixel) of the participant 130 within an image 200 captured by camera 101. For example, system 100 may generate an algorithm or look-up table to convert each pixel of background image 202 to a 3D physical location, which algorithm or look-up table may be used to determine the 3D physical position of a participant contour 201 (e.g., based on the location of ground contact point 260 within image 200). Alternatively, system 100 may generate an algorithm or look-up table to convert a distance (in pixels) from an edge of image 202 to 3D physical locations, which algorithm or look-up table may be used to determine the 3D physical position of a participant contour 201 (e.g., by measuring the distance in pixels between ground contact point 260 of participant contour 201 and an edge of image 200). Alternatively, system 100 may generate an algorithm or look-up table to convert a distance (in pixels) from a reference marker (e.g., marker 140 or 141) to 3D physical locations, which algorithm or look-up table may be used to determine the 3D physical position of a participant contour 201 (e.g., by measuring the distance in pixels between ground contact point 260 of participant contour 201 and the reference marker).
The prior art of record Kim et al. (US 20180268812 A1) teach: (Abstract) A sequence of frames of the video data that includes images corresponding to lip movement on a face is determined. The audio data is endpointed based on first audio data that corresponds to a first frame of the sequence of frames and second audio data that corresponds to a last frame of the sequence of frames. A transcription of the endpointed audio data is generated by an automated speech recognizer. The generated transcription is then provided for output.
The prior art of record Martin et al.(US 2013/0169801 A1) teach: [0003] An example method for processing visual communications at a video camera according to the disclosure includes analyzing video content captured by the video camera to identify a visual command presented to the camera by a user in the field of view of the camera, executing the visual command identified from the video content captured by the video camera, wherein executing the visual command includes setting one or more operating parameters of the video camera, determining a command completion indication corresponding to whether the command was successfully executed, and providing a command completion indication, corresponding to the command completion indication, to the user. 
The prior art of record Karamon et al.(US 5055939) teach: Claim 7(part), In a system providing an audience with viewable images depicting motion and wherein a sound signal track is associated with a source of said viewable images for providing a first sound signal synchronized in real time with said viewable images for reproducing a first audio message including a first spoken language synchronized in real time with lip motions of persons depicted talking by said images, a method for providing at least one other audio message synchronized in real time with said viewable images and including a second spoken language being a translation of said first spoken language, with articulation of said second spoken language appearing generally to correspond with lip motions of persons depicted talking by said images.

The prior art of record Okuno (US 2010/0265164 A1) teach: The prior art of record Rao et al. (US 2019/0333508 A1) teach: Claim1. An image processing apparatus for compositing an image of a physical space and an image of a virtual object, comprising: a unit which acquires a position of a sound source on the physical space and a position of the virtual object; and a change unit which changes a sound based on the sound source in accordance with the position of the sound source and the position of the virtual object. 
The prior art of record Rao et al. (US 2019/0333508 A1) teach:  (Abstract) A voice recognition system is provided with a user interface to display content, a camera to provide a first signal indicative an image of a user viewing the content, and a microphone to provide a second signal indicative of a voice command that corresponds to a requested action. The voice recognition system is further provided with a controller that is programmed to receive the first and second signals, filter the voice command based on the image, and perform the requested action based on the filtered voice command. [0027] In another embodiment, the voice recognition system 10 further improves the accuracy of the translation of the voice request by combining the voice command with facial recognition and lip-reading to narrow down the search field. The voice recognition system 10 uses facial recognition to detect face and lip motions and correlates the motion to predetermined facial motion corresponding to the phonics of the speech. 
The prior art of record Maali et al. (US 6567775 B1) teach: Col 2, lines 35-45, The audio-based speaker identification system identifies one or more potential speakers for a given segment using an enrolled speaker database. The video-based speaker identification system identifies one or more potential speakers for a given segment using a face detector/recognizer and an enrolled face database. An audio-video decision fusion process evaluates the individuals identified by the audio-based and video-based speaker identification systems and determines the speaker of an utterance in accordance with the present invention. Col 3, line 61 to Col 4, line 8, The audio-video speaker identification system 100 includes a speech recognition system, a speaker segmentation system, an audio-based speaker identification system and a video-based speaker identification system. The speech recognition system produces transcripts with time-alignments for each word in the transcript. The speaker segmentation system separates the speakers and identifies all possible frames where there is a segment boundary. A segment is a continuous portion of the audio source associated with a given speaker. The audio-based speaker identification system thereafter uses an enrolled speaker database to assign a speaker to each identified segment. The video-based speaker identification system thereafter uses a face detector/recognizer and an enrolled face database to independently assign a speaker to each identified segment.
The prior art of record Tammi et al.(US 2015/0139426 A1) teach : Claim 60 (part) determine a direction of at least one audio source based on at least two audio signals; determine a visual image for the at least one audio source so as to display the at least one audio source on the display; receive an input from the display to select the visual image to control the at least one audio source; output at least one audio signal associated with the at least one audio source; and process the at least one audio signal dependent on the received input.
The prior art of record alone or in combination failed to teach, for Claim 1, “command processing software configured to (i) track said audio samples in a time domain; (ii) track said frames of pixel data in said time domain and utilize said processor and said computer memory to group, in said time domain, said frames of pixel data with correspondingly timed audio samples; wherein the processor is configured to identify a source of the audio data stream from the frames of pixel data and the audio samples and to identify at least one command issued by the source; and a database of authorization data for matching said at least one command to the authorization data to determine whether the source of the audio data stream has permission to issue the command.”; for claim 33, “wherein said imaging software isolates, from said frames of digital pixel data, a subset of pixels representing a source of the speech input; and wherein said processor generates a voice token profile for the respective sets of digital audio samples based on said subset of pixels representing the source of speech input; wherein said processor stores in said database each respective speech profile, filters the database for identified speech profiles associated with individual users, and stores the identified speech profiles as respective codecs for respective individuals.”; for claim 37, “wherein said imaging software isolates, from said frames of digital pixel data, a subset of pixels representing a physical source of the speech input; and wherein said command processing software correlates, on a time basis, each audio sample to respective subsets of pixels representing the physical source in respective groups of sequential frames of image data; wherein said imaging software is configured to track multiple positions of said physical source of speech input by deriving respective positions of said physical source from the respective subsets of pixels; wherein said command processing software stores in said memory a voice token profile comprising resulting correlations between said subsets of pixels and said audio samples, and wherein said imaging software accesses said voice token profile for setting image acquisition parameters in the camera.”
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878.  The examiner can normally be reached on Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-74537453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2656