Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This office action is in response to application 16/789,953, which was filed 02/13/20, and is a continuation of application 15/964,661, now US Patent 10,580,410. Claims 1-20 are pending in the application and have been considered.


Claim Objections
In claim 8, lines 6, 7, 8, 9, and 13, should “obtain”, “obtain”, “compare”, “select”, and “obtain” be “obtaining”, “obtaining”, “comparing”, “selecting”, and “obtaining”, respectively?
In claim 9, line 1, should “present” be “presenting”?  
In claim 12, line 1, should “establish” be “establishing”?  

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed 


Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Zurek et al. (2014/0350924) in view of Beck et al. (7,920,158).

Consider claim 1, Zurek discloses a method comprising: obtaining audio of a communication session (microphones receive an audio signal, [0021], for example, during a meeting, [0064]); obtaining an image of a user of the first device (determining if the user is an authorized user using a captured image, [0047]); comparing the image to a particular image (facial recognition by comparing captured facial images against reference images, [0047]); selecting a first speech recognition system from among two or more speech recognition systems based on the comparison of the image to the particular image, each of the two or more speech recognition systems including a different speech engine trained to automatically recognize speech in audio (determining that a particular voice-recognition engine should be used for performing voice recognition, based on for example, the gender of a speaker identified from image data the device captures, [0048], the identified speaker might have created a speech recognition database during a previous training session, [0058]); and obtaining a transcription of the audio, the transcription of the audio generated using the selected first speech recognition system (creating an annotated dictation record, [0064], Fig. 8). 
Zurek does not specifically mention audio of a communication session between a first device and a second device.
Beck discloses audio of a communication session between a first device and a second device (transmitting and receiving video and audio data among participants in a video conference, Col 5 lines 50-63).


Consider claim 7, the combination above of Zurek with Beth disclose at least one non-transitory computer-readable media (Zurek, memory 212, [0022]) configured to store one or more instructions that when executed by at least one processor (Zurek, processor 214, [0074]) cause or direct a system to perform the method of claim 1 (see claim 1 above). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Zurek to result in at least one non-transitory computer-readable media configured to store one or more instructions that when executed by at least one processor cause or direct a system to perform the method of claim 1, for reasons similar to those for claim 1.
Consider claim 8, Zurek discloses a system comprising: at least one processor (processor 214, [0074]); and at least one non-transitory computer-readable media communicatively coupled to the at least one processor and configured to store one or more instructions that when executed by the at least one processor cause or direct the system to perform operations (memory 212, [0022]) comprising: obtain audio of a communication session (microphones receive an audio signal, [0021], for example, during a meeting, [0064]); obtain an image of a user of the device (determining if the user is an authorized user using a captured image, [0047]); compare the image to a particular image (facial recognition by comparing captured facial images against reference images, [0047]); select a first speech recognition system from among two or more speech recognition systems based on the comparison of 
Zurek does not specifically mention audio of a communication session between the system and a device.
Beck discloses audio of a communication session between the system and a device (transmitting and receiving video and audio data among participants in a video conference, Col 5 lines 50-63).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Zurek by including audio of a communication session between the system and a device for reasons similar to those for claim 1.

Consider claim 15, Zurek discloses a method comprising: obtaining audio (microphones receive an audio signal, [0021]), the audio generated by a second device based on speech of a second user (audio input module 204 includes elements needed to receive acoustic signals that include speech, represented by the voice of multiple individuals, [0028]); obtaining an image of the second user (determining if the user is an authorized user using a captured image, [0047]); comparing the image to a particular image (facial recognition by comparing captured facial images against reference images, [0047]); selecting a first speech recognition system from among two or more speech recognition systems based on the comparison of the image to the particular image, each of the two or more speech recognition systems including a different speech engine trained to automatically recognize speech in 
Zurek does not specifically mention audio to be directed to a first device associated with a first user.
Beck discloses audio to be directed to a first device associated with a first user (transmitting and receiving video and audio data among participants in a video conference, Col 5 lines 50-63).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Zurek by including audio to be directed to a first device associated with a first user for reasons similar to those for claim 1.

Consider claim 2, Zurek discloses presenting the transcription of the audio in real-time during the communication session (generating a marketing brochure generated in real time using an annotated dictation of a meeting, [0064]-[0066], Fig 8). 

Consider claim 3, Zurek does not specifically mention the presentation of the transcription is performed by the second device.
Beck discloses the presentation of the transcription is performed by the second device (the transcription text is displayed at each endpoint on the user interface in the form of a moving “ticker” of text, Col 11 lines 35-40, Fig 3, element 66).


Consider claim 4, Zurek discloses establishing the communication session, wherein the image is obtained after establishing the communication session (cameras continue taking images and adapting processing by the voice recognition module, based, for example, on who is gazing at the device, [0020]). 
Zurek does not specifically mention a communication session between the first device and the second device.
Beck discloses a communication session between the first device and the second device (transmitting and receiving video and audio data among participants in a video conference, Col 5 lines 50-63).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Zurek by including a communication session between the first device and the second device for reasons similar to those for claim 1.

Consider claim 5, Zurek discloses the first speech recognition system is selected from among the two or more speech recognition systems based on the image not matching the particular image and the first speech recognition system automatically recognizes speech and is not specifically trained for the user (determining that a particular voice-recognition engine should be used for performing voice recognition, based on for example, the gender of a speaker identified from image data the device captures, [0048]). 



Consider claim 9, Zurek discloses the operations further comprise presenting the transcription of the audio in real-time during the communication session (generating a marketing brochure generated in real time using an annotated dictation of a meeting, [0064]-[0066], Fig 8). 

Consider claim 10, Zurek does not, but Beck discloses a transcription of the audio is presented by the system (the transcription text is displayed at each endpoint on the user interface in the form of a moving “ticker” of text, Col 11 lines 35-40, Fig 3, element 66). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Zurek such that a transcription of the audio is presented by the system for reasons similar to those for claim 1.

Consider claim 11, Zurek does not, but Beck discloses the image of the user of the device is directed to the system for presentation of the image by the system (displaying images, Col 11 lines 35-40, Fig 3, element 66). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Zurek such that the image of the user of the device is directed to the system for presentation of the image by the system for reasons similar to those for claim 1.


Zurek does not specifically mention a communication session between the system and the device.
Beck discloses a communication session between the system and the device (transmitting and receiving video and audio data among participants in a video conference, Col 5 lines 50-63).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Zurek by including communication session between the system and the device for reasons similar to those for claim 1.

Consider claim 13, Zurek discloses the first speech recognition system is selected from among the two or more speech recognition systems based on the image not matching the particular image and the first speech recognition system automatically recognizes speech and is not specifically trained for the user (determining that a particular voice-recognition engine should be used for performing voice recognition, based on for example, the gender of a speaker identified from image data the device captures, [0048]). 

Consider claim 14, Zurek discloses the first speech recognition system is selected from among the two or more speech recognition systems based on the image matching the particular image and the first speech recognition system automatically recognizes speech and is trained for the user (the identified speaker might have created a speech recognition database during a previous training session, [0058]). 


Zurek does not specifically mention presentation to the first user of the first device. 
Beck discloses presentation to the first user of the first device (the transcription text is displayed at each endpoint on the user interface in the form of a moving “ticker” of text, Col 11 lines 35-40, Fig 3, element 66).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Zurek by including presentation to the first user of the first device for reasons similar to those for claim 1.

Consider claim 17, Zurek discloses the first speech recognition system is selected from among the two or more speech recognition systems based on the image matching the particular image and the first speech recognition system automatically recognizes speech and is trained for the second user (the identified speaker might have created a speech recognition database during a previous training session, [0058]). 

Consider claim 18, Zurek does not, but Beck discloses the audio is directed to the first device during a communication session between the first device and the second device (transmitting and receiving video and audio data among participants in a video conference, Col 5 lines 50-63). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Zurek such that t the audio is directed to the first device during a communication session between the first device and the second device for reasons similar to those for claim 1.

Consider claim 19, Zurek discloses the transcription of the audio is generated for presentation (generating a marketing brochure generated in real time using an annotated dictation of a meeting, [0064]-[0066], Fig 8).
Zurek does not specifically mention directing the transcription of the audio to the first device for presentation of the transcription of the audio by the first device. 
Beck discloses directing the transcription of the audio to the first device for presentation of the transcription of the audio by the first device (the transcription text is displayed at each endpoint on the user interface in the form of a moving “ticker” of text, Col 11 lines 35-40, Fig 3, element 66).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Zurek by including directing the transcription of the audio to the first device for presentation of the transcription of the audio by the first device for reasons similar to those for claim 1.

Consider claim 20, Zurek discloses the transcription of the audio is generated for presentation (generating a marketing brochure generated in real time using an annotated dictation of a meeting, [0064]-[0066], Fig 8).
Zurek does not specifically mention the image of the second user is directed to the first device for presentation of the image by the first device. 
Beck discloses the image of the second user is directed to the first device for presentation of the image by the first device (the transcription text is displayed at each endpoint on the user interface in the form of a moving “ticker” of text, Col 11 lines 35-40, Fig 3, element 66).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Zurek by including the image of the second user is .


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US 20170060828 A1  Rainisto discloses gesture based annotations
US 20140365221 A1 Ben-Ezra discloses speech recognition with visual capturing and features (see Fig 5)
US 20140379346 A1 Aleksic discloses video analysis based language model adaptation
US 20110295603 A1 Meisel discloses speech recognition improvement through speaker categories detected using camera images
US 20160140964 A1 Connell, II discloses speech recognition system adaptation based on non-acoustic features
US 10847146 B2 Jiang discloses multiple voice recognition model switching
US 20090125307 A1 Wang discloses providing each user at multiple devices with speaker-dependent speech recognition engines
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jesse Pullias whose telephone number is 571/270-5135. The examiner can normally be reached on M-F 8:00 AM - 4:30 PM. The examiner’s fax number is 571/270-6135.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Dan Washburn can be reached on 571/272-5551. 

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).


/Jesse S Pullias/
Primary Examiner, Art Unit 2657                                        04/20/21