DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 02/14/2022 has been entered.
This communication is in response to the Amendments and Arguments filed on   02/14/2022. 
Claims 1-10 and 12-21 are pending and have been examined.
All previous objections/rejections not mentioned in this Office Action have been withdrawn by the examiner. 
Notice of Pre-AIA  or AIA  Status
The present application is being examined under the pre-AIA  first to invent provisions. 
Response to Arguments
Applicant's arguments filed 02/14/2022 have been fully considered but they are not persuasive. 
Applicant asserts on page 11 that the cited art does not disclose using information about a speaker’s location as a speaker moves about a surrounding location as part of generating a transcript. The Examiner respectfully disagrees with this assertion. As discussed in the previous Advisory Action, Kalinli-Akbacak teaches that 
Hence Applicant’s arguments are not persuasive.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 4, 7, 15, 18, and 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bellamy et al. (U.S. PG Pub No. 2018/0225271), hereinafter Bellamy, in view of McLaren et al. (U.S. PG Pub No. 2016/0283185), hereinafter McLaren, in view of Kalinli-Akbacak (U.S. PG Pub No. 2014/0112556), hereinafter Kalinli-Akbacak, .

Regarding claims 1 and 15, Bellamy teaches
(claim 1) A machine implemented automated transcription method comprising ([0011:1-9] a method for converting oral presentations to textual transcripts, i.e. automated transcription, and further annotating the transcript):
(claim 15) An apparatus comprising ([0050:1-2] a computer system, i.e. apparatus):
(claim 15) one or more computer processors ([0050:1-3] a computer system includes a processor); and
(claim 15) a computer-readable storage medium comprising instructions for controlling the one or more computer processors to operate a transcription enhancer having a speech transcription component, one or more recognizers or analyzers, and an information encoder to ([0024], [0052:3-7] a computer readable storage medium having computer readable program instructions, which cause the processor to carry out, i.e. controlling the one or more processors to, the processes to run modules including a text-to-speech module, physiological interpreter module, and a text annotator module, i.e. operate a transcription enhancer having a speech transcription component, one or more recognizers or analyzers, and an information encoder):

receiving, by a machine implemented transcription enhancer, a speaker's captured data including spoken utterances from the speaker captured by one or more audio capture devices, video data of the speaker captured by one or more video capture devices, and sensor data captured by one or more sensors, the sensor data being associated with a surrounding location where the spoken utterances were made ([0011:8-9], [0017-8], [0025:1-3], [0027:1-4], [0029] a text-to-speech module receives orally presented material, i.e. spoken utterances, of a presenter, i.e. speaker, through a microphone, i.e. one or more audio capture devices, video equipment can be used to capture physiological data of the presenter, i.e. video data of the speaker captured by one or more audio capture devices, and further physiological data within a location, such as a conference room, can be captured by sensors including temperature, humidity, and odor detectors, i.e. sensor data captured by one or more sensors, the sensor data being associated with a surrounding location where the spoken utterances were made, where the textual transcript and physiological interpretations of the data are further received by the text annotator module, i.e. receiving, by a machine implemented transcription enhancer); and
generating, by the machine implemented transcription enhancer, an enhanced transcription of the spoken utterances contained in the captured data ([0025:1-3], [0029:1-5] a text-to-speech module converts received speech, i.e. spoken utterances contained in the captured data, into a textual transcript that is further annotated with physiological interpretations, i.e. generating an enhanced transcription, by a text annotator module, i.e. machine implemented transcription enhancer), including:
performing, with a speech transcription component of the machine implemented transcription enhancer, speech recognition on the spoken utterances contained in the captured data to produce a text-only transcription of the spoken utterances contained in the captured data ([0025], [0037:4-11] presenter speech is digitized using a microphone, i.e. spoken utterances contained in the captured data, and speech recognition is used, i.e. performing speech recognition, to convert the speech, i.e. on the spoken utterances contained in the captured data, to a textual transcript, i.e. produce a text-only transcription, where the process may be performed by a text-to-speech module, i.e. speech transcription component of the machine implemented transcription enhancer);
performing, with one or more recognizers or analyzers of the machine implemented transcription enhancer, recognition or analysis of the spoken utterances, the video data and the sensor data of the captured data to generate context information for the spoken utterances, including speaker and surrounding information associated with all the spoken utterances, and speaker state information respectively associated with various ones of the spoken utterances... ([0019], [0025-7] a physiological interpreter module, i.e. one or more recognizers or analyzers of the machine implemented transcriptions enhancer, interprets the physiological data associated with a corresponding segment of an oral presentation, i.e. recognition or analysis of the spoken utterances, the video data and the sensor data of the captured data, where physiological data can include temperature, heart or respiration rate, or neural oscillations during the presentation, i.e. speaker and surrounding information associated with all the spoken utterances, as one or more ;-2-Application No. 16/234,542 Attorney Docket No. 113622-254182 (AB3503US) 
-2-Application No. 16/234,542Attorney Docket No. 113622-254182 (AB3503US)detecting, with the information encoder, indications of state changes in the speaker's state based on the speaker state information respectively associated with various ones of the spoken utterances ([0026], [0027-8] a physiological interpreter module, i.e. information encoder, receives the physiological data corresponding to particular segments of the presentation, i.e. speaker state information respectively associated with various ones of the spoken utterances, and uses rules to interpret the physiological data profiles to identify a moment of heightened emotion, i.e. detecting…indications of state changes in the speaker’s state); and
 in response to detecting each state change in the speaker's state, 
adding, with the information encoder, a speaker state information block having the speaker state information corresponding to the particular state change to the text-only transcription to further contribute in providing the context to the text-only transcription ([0026], [0027:11-12], [0029] a text annotator module, i.e. information encoder, receives the timestamped physiological interpretations, i.e. speaker state information corresponding to the particular state change, and annotates the textual transcript with the interpretations by matching the time stamps of the transcript and the physiological interpretations, i.e. in response to detecting each state change…adding…a speaker state information block…to the text-only transcription).  
While Bellamy provides the identification of speaker physiology, Bellamy does not specifically teach the recognition of ambient noise or that speaker information includes the physical state of the speaker’s face or body, and thus does not teach
wherein the speaker and surrounding information includes a level and a type of ambient noise of the location captured by the sensor data...
McLaren, however, teaches wherein the speaker and surrounding information includes a level and a type of ambient noise of the location captured by the sensor data ([0017-9] audio events can be detected, i.e. speaker and surrounding information, including vehicle noises of interest, musical instruments, or non-speech audio, which are instances of background noise, i.e. type of ambient noise of the location, in the audio file, i.e. captured by the sensor data, and recognized acoustic features further include measurements of energy, i.e. level).
Bellamy and McLaren are analogous art because they are from a similar field of endeavor in generating feature information related audio input including speaker and noise information that can be used for transcripts. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the identification of speaker physiology teachings of Bellamy with the detection of background noises of interest as taught by McLaren. The motivation to do so would have been to achieve a predictable result of enabling the recognition of different types of audio sounds within an audio file (McLaren [0019]).
While Bellamy in view of McLaren provides the recognition of speaker information through audio information, Bellamy in view of McLaren does not specifically teach the recognition of speaker information based on physical input, and thus does not teach
wherein the speaker and surrounding information includes ... information about a location of the speaker as the speaker moves about the surrounding location, and the speaker state information includes a physical state of the speaker's face or hand captured by the video data, the physical state comprising facial expressions or hand gestures reflective of the speaker's emotional state.
Kalinli-Akbacak, however, teaches wherein the speaker and surrounding information includes ... information about a location of the speaker as the speaker moves about the surrounding location, and the speaker state information includes a physical state of the speaker's face or hand captured by the video data, the physical state comprising facial expressions or hand gestures reflective of the speaker's emotional state (visual features, i.e. speaker state information, used to determine the emotional state of a user, i.e. reflective of the speaker’s emotional state, include body gestures derived from positions as the user walks or stands, where different points on the body may be tracked in a series of images, i.e. information about a location of the speaker as the speaker as the speaker moves about the surrounding location, and also include facial expressions, i.e. physical state of the speaker's face…comprising facial expressions, and other body gestures, such as motions of the user’s hands, i.e. physical state of the speaker's hand…comprising hand gestures, where the visual features may be derived from signal obtained by one or more sensors Tables I and II,[0020],[0022],[0030],[0032], such as video images from a digital camera, i.e. captured by the video data [0029]).
Bellamy, McLaren, and Kalinli-Akbacak are analogous art because they are from a similar field of endeavor in generating feature information related audio input including speaker and noise information that can be used by computer systems. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing 
While Bellamy in view of McLaren and Kalinli-Akbacak provides the annotation of textual transcripts with physiological interpretations, and the addition of a speaker name into a transcript, Bellamy in view of McLaren and Kalinli-Akbacak does not specifically teach annotating with specific speaker or surrounding information, and thus does not teach
adding, with an information encoder of the machine implemented transcription enhancer, a speaker and surrounding information block having the speaker and surrounding information associated with all the spoken utterances, to the text-only transcription to contribute in providing a context to the text-only transcription.
Nassar, however, teaches adding, with an information encoder of the machine implemented transcription enhancer, a speaker and surrounding information block having the speaker and surrounding information associated with all the spoken utterances, to the text-only transcription to contribute in providing a context to the text-only transcription ([0025:7-10], [0040:15-23] the analysis engine, i.e. information encoder of the machine implemented transcription .
Bellamy, McLaren, Kalinli-Akbacak, and Nassar are analogous art because they are from a similar field of endeavor generating feature information related audio input including speaker and noise information that can be used by computer systems. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the annotation of a transcript with physiological interpretations teachings of Bellamy, as modified with McLaren and Kalinli-Akbacak, with the annotation of a transcript with specific speaker and participant information as taught by Nassar. The motivation to do so would have been to achieve a predictable result of enabling meeting insights to be available for later review or analysis by participants or other users who have an interest in the meeting (Nassar [0040:20-23]).

Regarding claims 4 and 18, Bellamy in view of McLaren, Kalinli-Akbacak, and Nassar teaches claims 1 and 15, and Bellamy further teaches
generating a data element representative of the speaker's state at the time the particular state change is detected, the data element having a speaker state data element type and a speaker state value ([0027] a physiological interpreter module interprets physiological data as one or more emotional responses, i.e. speaker’s state at the time the particular state change is detected, where the emotional ; and
wherein adding the speaker state information block comprises adding the data elements having the speaker state data element type and the speaker state value, representative of the speaker's state at the time the particular state change is detected ([0026], [0027], [0029] a text annotator module annotates the text transcript, i.e. adding the data elements, with the interpretations of the physiological interpreter module, including the emotional response and setting, gradation, or strength, i.e. speaker state data element type and the speaker state value, where the annotation utilizes matching the timestamps of the transcript and interpretations, i.e. representative of the speaker’s state at the time the particular state change is detected).

Regarding claim 7, Bellamy in view of McLaren, Kalinli-Akbacak, and Nassar teaches claim 6, and Bellamy further teaches
generating a summary from the enhanced transcriptions of spoken utterances of the plurality of speakers ([0011:6-13], [0015:3-7], [0045] automated summarization of the annotated transcript, i.e. generating a summary from the enhanced transcription, is performed using a transcript from the oral presentation of one or more presenters, i.e. spoken input of the plurality of speakers).

Regarding claims 21, Bellamy in view of McLaren, Kalinli-Akbacak, and Nassar teaches claim 15, and Bellamy further teaches
generate a summary from the enhanced transcription of spoken input of the plurality of speakers ([0011:6-13], [0015:3-7], [0045] automated summarization of the annotated transcript, i.e. generating a summary from the enhanced transcription, is performed using a transcript from the oral presentation of one or more presenters, i.e. spoken input of the plurality of speakers). 

Claim(s) 2, 3, 16, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bellamy, in view of McLaren, in view of Kalinli-Akbacak, in view of Nassar, and further in view of Black et al. (U.S. PG Pub No. 2006/0075228), hereinafter Black.

Regarding claims 2 and 16, Bellamy in view of McLaren, Kalinli-Akbacak, and Nassar teaches claims 1 and 15.
While Bellamy in view of McLaren, Kalinli-Akbacak, and Nassar provides the creation of annotated transcriptions, Bellamy in view of McLaren, Kalinli-Akbacak, and Nassar does not specifically teach protection of information in the transcription, and thus does not teach
using an exposure policy associated with the speaker to limit the use of some of the context information.  
Black, however, teaches using an exposure policy associated with the speaker to limit the use of some of the context information ([0038:1-13], [0079-80] a process detects sensitive information using selection rules, i.e. using an exposure policy, entered by the user, i.e. associated with the speaker, and protects the 
Bellamy, McLaren, Kalinli-Akbacak, Nassar, and Black are analogous art because they are from a similar field of endeavor in processing information in textual documents. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the creation of annotated transcriptions teachings of Bellamy, as modified by McLaren, Kalinli-Akbacak, and Nassar, with the protection of information through encryption, redaction, or removal as taught by Black. The motivation to do so would have been to achieve a predictable result of protecting sensitive information from being viewed while maintaining the overall ability to use the document (Black [0040:9-14]).

Regarding claims 3 and 17, Bellamy in view of McLaren, Kalinli-Akbacak, Nassar, and Black teaches claims 2 and 16, and Black further teaches
wherein the exposure policy prohibits the use of some of the context information ([0038:1-13], [0079-80] a process detects sensitive information using selection rules, i.e. using an exposure policy, entered by the user, and protects the information through encryption, redaction, or removal to prevent it from being viewed, i.e. prohibit the use of some of the context information, where the information can be protected either as it is entered into the document or after the document is created).  
Where the motivation to combine is the same as previously presented.

Claim(s) 5, 6, 19, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bellamy, in view of McLaren, in view of Kalinli-Akbacak, in view of Nassar, and further in view of Thomson et al. (U.S. Patent No. 10573312), hereinafter Thomson.

Regarding claims 5 and 19, Bellamy in view of McLaren, Kalinli-Akbacak, and Nassar teaches claims 1 and 15. 
While Bellamy in view of McLaren, Kalinli-Akbacak, and Nassar provides the addition of annotations with speaker information, Bellamy in view of McLaren, Kalinli-Akbacak, and Nassar does not specifically teach the addition of other embellishments, and thus does not teach
selecting one or more embellishments based on the added speaker and surrounding information block or the added speaker state information block(s); and
 rendering the one or more embellishments along with rendering the enhanced transcription.  
Thomson, however, teaches selecting one or more embellishments based on the added speaker and surrounding information block or the added speaker state information block(s) ((91:12-25, 41-60) one or a combination of additions to the text, such as emojis, emoticons, or text descriptions of emotions, i.e. one or more embellishments, are added to the transcription to adjust the transcription to convey an emotion or word emphasis associated by a detector with words in the transcription, i.e. selecting…based on the speaker state information); and
rendering the one or more embellishments along with rendering the enhanced transcription ((91:61-92:2) the device may present, i.e. rendering, the adjustments to the transcription, i.e. rendering the one or more embellishments along with rendering the enhanced transcription, to the user).
Bellamy, McLaren, Kalinli-Akbacak, Nassar, and Thomson are analogous art because they are from a similar field of endeavor in generating feature information related audio input including speaker and noise information that can be used for transcripts. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the addition of annotations with speaker information teachings of Bellamy, as modified by McLaren, Kalinli-Akbacak, and Nassar, with the emojis, emoticons, or text descriptions of emotions as taught by Thomson. The motivation to do so would have been to achieve a predictable result of adjusting the transcript to attempt to convey the determined emotions and word emphasis (Thomson (91:12-25)).

Regarding claims 6 and 20, Bellamy in view of McLaren, Kalinli-Akbacak, and Nassar teaches claims 1 and 15, and Bellamy further teaches 
wherein the speaker is a first speaker of a plurality of speakers ([0011:6-11], [0015:3-7], [0026-7], [0041] an oral presentation of one or more presenters, i.e. a plurality of speakers, is timestamped along with the physiological data for the presenter, i.e. speaker is a first speaker), and the method further comprises:
repeating the receiving and generating for each of the other speakers ([0011:1-11], [0015:3-7], [0017-8], [0025:1-3], [0026-7], [0029] a text-to-speech module .  
Additionally, the adding of a speaker and surrounding information block as part of the generation step, and previously taught by Nassar, also teaches the repeated annotation for each of the other speakers ([0025:7-10], [0040:15-23]).
While Bellamy in view of McLaren, Kalinli-Akbacak, and Nassar provides the recognition of speaker audio from multiple speakers, Bellamy in view of McLaren, Kalinli-Akbacak, and Nassar does not specifically teach the identification of each speaker in a group of speakers, and thus does not teach
identifying each of the plurality of speakers.
Thomson, however, teaches identifying each of the plurality of speakers ((49:56-50:1) speaker identification, i.e. identifying, may include voiceprints to distinguish between voices, i.e. each of the plurality of speakers).
.

Claim(s) 8 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bellamy, in view of McLaren, and further in view of Kalinli-Akbacak.

Regarding claim 8, Bellamy teaches
A non-transitory computer-readable storage medium having stored thereon computer executable instructions, which when executed by a computer device, cause the computer device to operate a transcription enhancer having a speech transcription component, one or more recognizers or analyzers, and an information encoder to ([0024], [0052:3-7] a computer readable storage medium having computer readable program instructions, i.e. stored thereon computer executable instructions, which cause the processor to carry out, i.e. executed by a computer device, cause the computer device to, the processes to run modules including :
receive a speaker's captured data comprising audio data of utterances of the speaker captured by one or more audio capture devices, video data of the speaker captured by one or more video capture devices, and sensor data of a surrounding location where the utterances were spoken, captured by one or more sensors ([0011:8-9], [0017-8], [0025:1-3], [0027:1-4], [0029] a text-to-speech module receives orally presented material, i.e. audio data of utterances, of a presenter, i.e. speaker, through a microphone, i.e. one or more audio capture devices, video equipment can be used to capture physiological data of the presenter, i.e. video data of the speaker captured by one or more audio capture devices, and further physiological data within a location, such as a conference room, can be captured by sensors including temperature, humidity, and odor detectors, i.e. sensor data of a surrounding location where the utterances were spoken, captured by one or more sensors); and
generate an enhanced transcription of spoken utterances contained in the audio data, including to ([0025:1-3], [0029:1-5] a text-to-speech module converts received speech, i.e. spoken utterances contained in the captured data, into a textual transcript that is further annotated with physiological interpretations, i.e. generating an enhanced transcription):
perform, with the speech transcription component, speech recognition on the audio data to produce a text-only transcription of the spoken utterances contained in the audio data ([0025], [0037:4-11] presenter speech is digitized using a ;
perform, with the one or more recognizers or analyzers, recognition or analysis of the spoken utterances, the video data and the sensor data to generate-4-Application No. 16/234,542Attorney Docket No. 113622-254182 (AB3503US) context information for the spoken utterances, including speaker and surrounding information associated with all the spoken utterances,...and speaker state information respectively associated with various ones of the spoken utterances ([0019], [0025-7] a physiological interpreter module, i.e. one or more recognizers or analyzers of the machine implemented transcriptions enhancer, interprets the physiological data associated with a corresponding segment of an oral presentation, i.e. recognition or analysis of the spoken utterances, the video data and the sensor data of the captured data, where physiological data can include temperature, heart or respiration rate, or neural oscillations during the presentation, i.e. speaker and surrounding information associated with all the spoken utterances, as one or more emotional responses to a particular segment of a presentation, i.e. speaker state information respectively associated with various ones of the spoken utterances)...; and 
detect, with the information encoder, indications of state changes in the speaker's state based on the speaker state information in the generated context information ([0026], [0027-8] a physiological interpreter module, i.e. information encoder, receives the physiological data corresponding to particular segments of the presentation, i.e. speaker state information respectively associated with various ones of , and in response to detection of a state change in the speaker's state to:
generate, with the information encoder, a data element representative of the speaker state information, the data element including a speaker state data element type and a speaker state value ([0027] a physiological interpreter module interprets physiological data as one or more emotional responses, i.e. speaker’s state at the time the particular state change is detected, where the emotional response, i.e. speaker state element type, can be given a binary flag value, a gradation of an emotional response, or a relative strength interpretation, i.e. speaker state value); and
add, with the information encoder, the data element having the speaker state data element type and the speaker state value, representative of the speaker's state at the time the-5-Application No. 16/234,542 Attorney Docket No. 113622-254182 (AB3503US)particular state change is detected to contribute to provide a context to the text-only transcription ([0026], [0027], [0029] a text annotator module, i.e. information encoder, annotates the text transcript, i.e. adding the data elements, with the interpretations of the physiological interpreter module, including the emotional response and setting, gradation, or strength, i.e. speaker state data element type and the speaker state value, where the annotation utilizes matching the timestamps of the transcript and interpretations, i.e. representative of the speaker’s state at the time the particular state change is detected).  

wherein the speaker and surrounding information includes a level and a type of ambient noise of the surrounding location captured by the sensor data...
McLaren, however, teaches wherein the speaker and surrounding information includes a level and a type of ambient noise of the location captured by the sensor data ([0017-9] audio events can be detected, i.e. speaker and surrounding information, including vehicle noises of interest, musical instruments, or non-speech audio, which are instances of background noise, i.e. type of ambient noise of the surrounding location, in the audio file, i.e. captured by the sensor data, and recognized acoustic features further include measurements of energy, i.e. level).
Bellamy and McLaren are analogous art because they are from a similar field of endeavor in generating feature information related audio input including speaker and noise information that can be used for transcripts. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the identification of speaker physiology teachings of Bellamy with the detection of background noises of interest as taught by McLaren. The motivation to do so would have been to achieve a predictable result of enabling the recognition of different types of audio sounds within an audio file (McLaren [0019]).
While Bellamy in view of McLaren provides the recognition of speaker information through audio information, Bellamy in view of McLaren does not specifically 
information about a location of the speaker as the speaker moves about the surrounding location,.., and the speaker state information includes a physical state of the speaker's face or hand captured by the video data, the physical state comprising facial expressions or hand gestures reflective of the speaker's emotional state.
Kalinli-Akbacak, however, teaches information about a location of the speaker as the speaker moves about the surrounding location,.., and the speaker state information includes a physical state of the speaker's face or hand captured by the video data, the physical state comprising facial expressions or hand gestures reflective of the speaker's emotional state (visual features, i.e. speaker state information, used to determine the emotional state of a user, i.e. reflective of the speaker’s emotional state, include body gestures derived from positions as the user walks or stands, where different points on the body may be tracked in a series of images, i.e. information about a location of the speaker as the speaker as the speaker moves about the surrounding location, and also include facial expressions, i.e. physical state of the speaker's face…comprising facial expressions, and other body gestures, such as motions of the user’s hands, i.e. physical state of the speaker's hand…comprising hand gestures, where the visual features may be derived from signal obtained by one or more sensors Tables I and II,[0020],[0022],[0030],[0032], such as video images from a digital camera, i.e. captured by the video data [0029]).


Regarding claim 14, Bellamy in view of McLaren and Kalinli-Akbacak teaches claim 8, and Bellamy further teaches
generate a summary from the enhanced transcriptions of spoken utterances of the plurality of speakers ([0011:6-13], [0015:3-7], [0045] automated summarization of the annotated transcript, i.e. generate a summary from the enhanced transcription, is performed using a transcript from the oral presentation of one or more presenters, i.e. spoken utterances of the plurality of speakers).  

Claim(s) 9 and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bellamy, in view of McLaren, in view of Kalinli-Akbacak, and further in view of Black.

While Bellamy in view of McLaren and Thomson provides the creation of annotated transcriptions, Bellamy in view of McLaren and Kalinli-Akbacak does not specifically teach protection of information in the transcription, and thus does not teach
use an exposure policy associated with the speaker to limit the use of some of the context information.  
Black, however, teaches use an exposure policy associated with the speaker to limit the use of some of the context information ([0038:1-13], [0079-80] a process detects sensitive information using selection rules, i.e. use an exposure policy, entered by the user, i.e. associated with the speaker, and protects the information through encryption, redaction, or removal to prevent it from being viewed, i.e. limit the use of some of the context information, where the information can be protected either as it is entered into the document or after the document is created).
Bellamy, McLaren, Kalinli-Akbacak, and Black are analogous art because they are from a similar field of endeavor in processing information in textual documents. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the creation of annotated transcriptions teachings of Bellamy, as modified by McLaren and Kalinli-Akbacak, with the protection of information through encryption, redaction, or removal as taught by Black. The motivation to do so would have been to achieve a predictable result of protecting sensitive information from being viewed while maintaining the overall ability to use the document (Black [0040:9-14]).

wherein the exposure policy prohibits the use of some of the context information ([0038:1-13], [0079-80] a process detects sensitive information using selection rules, i.e. using an exposure policy, entered by the user, and protects the information through encryption, redaction, or removal to prevent it from being viewed, i.e. prohibit the use of some of the context information, where the information can be protected either as it is entered into the document or after the document is created).
Where the motivation to combine is the same as previously presented.

Claim(s) 12 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bellamy, in view of McLaren, in view of Kalinli-Akbacak, and further in view of Thomson.

Regarding claim 12, Bellamy in view of McLaren and Kalinli-Akbacak teaches claim 8. 
While Bellamy in view of McLaren and Kalinli-Akbacak provides the addition of annotations with speaker information, Bellamy in view of McLaren and Kalinli-Akbacak does not specifically teach the addition of other embellishments, and thus does not teach
select one or more embellishments based on the data element having the speaker state data element type and the speaker state value being added into the enhanced transcription; and
render the one or more embellishments along with the render of the enhanced transcription.  
Thomson, however, teaches select one or more embellishments based on the data elements having the speaker state data element type and the speaker state value being added into the enhanced transcription ((91:12-25, 41-60) one or a combination of additions to the text, such as emojis, emoticons, or text descriptions of emotions, i.e. one or more embellishments, are added to the transcription to adjust the transcription to convey an emotion or word emphasis associated by a detector with words in the transcription, i.e. selecting…based on the speaker state data element type and the speaker state value); and
render the one or more embellishments along with the render of the enhanced transcription ((91:61-92:2) the device may present, i.e. render, the adjustments to the transcription, i.e. render the one or more embellishments along with rendering the enhanced transcription, to the user).
Bellamy, McLaren, Kalinli-Akbacak, and Thomson are analogous art because they are from a similar field of endeavor in generating feature information related audio input including speaker and noise information that can be used for transcripts. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the addition of annotations with speaker information teachings of Bellamy, as modified by McLaren and Kalinli-Akbacak, with the emojis, emoticons, or text descriptions of emotions as taught by Thomson. The motivation to do so would have been to achieve a predictable result of adjusting the 

Regarding claim 13, Bellamy in view of McLaren and Kalinli-Akbacak teaches claim 8, and Bellamy further teaches 
wherein the speaker is a first speaker of a plurality of speakers ([0011:6-11], [0015:3-7], [0026-7], [0041] an oral presentation of one or more presenters, i.e. a plurality of speakers, is timestamped along with the physiological data for the presenter, i.e. speaker is a first speaker):
repeat the receipt of captured data and generation of enhanced transcription for each of the other speakers ([0011:1-11], [0015:3-7], [0017-8], [0025:1-3], [0026-7], [0029] a text-to-speech module receives orally presented material, i.e. spoken utterances, of one or more presenters, i.e. other speakers, through a microphone, i.e. one or more audio capture devices, video equipment can be used to capture physiological data of the presenter, i.e. video data of the speaker captured by one or more audio capture devices, and further physiological data within a location, such as a conference room, can be captured by sensors including temperature, humidity, and odor detectors, i.e. sensor data captured by one or more sensors, the sensor data being associated with a location where the spoken utterances were made, where the textual transcript and physiological interpretations of the data are timestamped, i.e. repeating the receiving…for each of the other speakers, are further received by the text annotator module for annotation by matching the timestamps for the received data, i.e. repeating…the generating for each of the other speakers).  

identifying each of the plurality of speakers.
Thomson, however, teaches identifying each of the plurality of speakers ((49:56-50:1) speaker identification, i.e. identifying, may include voiceprints to distinguish between voices, i.e. each of the plurality of speakers).
Bellamy, McLaren, Kalinli-Akbacak, and Thomson are analogous art because they are from a similar field of endeavor in generating feature information related audio input including speaker and noise information that can be used for transcripts. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the recognition of speaker audio from multiple speakers teachings of Bellamy, as modified by McLaren and Kalinli-Akbacak, with the identification of speakers through voiceprints as taught by Thomson. The motivation to do so would have been to achieve a predictable result of enabling the use of speaker-dependent models in an ASR system (Thomson (49:3-16)).
Conclusion	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICOLE A K SCHMIEDER whose telephone number is (571)270-1474. The examiner can normally be reached 8:00 - 5:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on (571) 272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NICOLE A K SCHMIEDER/Examiner, Art Unit 2659                                                                                                                                                                                                        
/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659