DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the
first inventor to file provisions of the AIA .

Response to Amendment
The amendment filed on September 13th, 2022 has been entered. Claims 1, 4-13, 16-17,
and 20 remain pending. Applicant’s amendments have overcome the 35 U.S.C. 101 rejections previously set forth in the Non-Final Office Action mailed on June 13th, 2022. 

Response to Arguments
Applicant’s arguments filed on September 13th, 2022 have been fully considered but
they are not persuasive. Applicant’s arguments with respect to claims 1, 4-13, 16-17,
and 20 have been considered but are moot because the new grounds of rejection were necessitated due to the amendments as there has been a change in scope.
	Applicant asserts on pages 12-13, that Bradley teaches that the system can, at the same time, push transcript changes to all terminals so they can display the latest version of the transcripts. Applicant further submits, that neither Bradley or Steelberg disclose “…wherein the notifying is performed according to preferences of the user…” 
	It is agreed upon that Bradley teaches that the push transcript is displayed at the same time; however, the latency for processing the digital information may not be noticeable nor is it mentioned in the cited references. Latency is a delay caused by the time it takes to process a digital signal. Furthermore, it is agreed upon that neither Bradley or Steelberg teach, “wherein the notifying is performed according to preferences of the user”, this limitation further limits the scope of the claim as with the notifying a user of the transcription via a device of the user. Please see below for the factual inquiries for establishing obviousness under 35 U.S.C. 103 as to give detail onto the rationale for obviousness for the features that have been amended, broadly the topics such as,  notifying is performed according to preferences of the user; and prior to notifying one or more other users of the transcription, receiving confirmation of an accuracy of the transcription. Applicant’s arguments with respect to independent claims 1, 13, and 17 under 102(a)(1) and 102(a)(2) have been fully considered and are moot upon a further consideration and a new ground(s) of rejection made under AIA  35 U.S.C. 103 as being unpatentable over Bradley et al. (US Pub. No. 2022/0115020 A1) hereinafter Bradley in view of Thomson et al. (US 2020/0243094 A1) hereinafter Thomson further in view of Jackson (US 2017/0287482 A1). 
	
	Applicant argues on pgs. 13-14, that dependent claims previously rejected under 35 U.S.C. 103 are now allowable as through the amendments present in the independent claims. However, as previously mentioned, “wherein the notifying is performed according to preferences of the user”, this limitation further limits the scope of the claim as with the notifying a user of the transcription via a device of the user. Please see below for the factual inquiries for establishing obviousness under 35 U.S.C. 103 as to give detail onto the rationale for obviousness for the features that have been amended, broadly the topics such as, notifying is performed according to preferences of the user; and prior to notifying one or more other users of the transcription, receiving confirmation of an accuracy of the transcription. Applicant’s arguments with respect to independent claims 1, 13, and 17 under 102(a)(1) and 102(a)(2) have been fully considered and are moot upon a further consideration and a new ground(s) of rejection made under AIA  35 U.S.C. 103 as being unpatentable over Bradley et al. (US Pub. No. 2022/0115020 A1) hereinafter Bradley in view of Thomson et al. (US 2020/0243094 A1) hereinafter Thomson further in view of Jackson (US 2017/0287482 A1). 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35
U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness
rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



The factual inquiries for establishing a background for determining obviousness under
35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

This application currently names joint inventors. In considering patentability of the
claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.


Claims 1, 4, 7, 9-11, and 13, 16, 17, and 20 are rejected under 35 U.S.C. 103 as being
unpatentable over Bradley et al. (US Pub. No. 2022/0115020 A1) hereinafter Bradley in view of Thomson et al. (US 2020/0243094 A1) hereinafter Thomson further in view of Jackson (US 2017/0287482 A1).
Regarding claim 1, Bradley teaches a computer-implemented method for transcribing media (Para. 66, a system that is configured to implement a method and system for automatic conversation transcription with non-exhaustive exemplary terminals and connected components; furthermore, para. 231 indicates modifications and variations may be made to the disclosed embodiments while remaining within the scope of the embodiments of the invention as defined by the following claims), the method comprising:
collecting media (Para. 66, first terminal device 21 can capture and play audio and/or video, both of which can be transmitted to a conferencing server 25 in the network 24);
extracting one or more features from the media, wherein the extracting is performed using machine learning techniques comprising a convolutional neural network and log short-term memory to parse the collected media and extract the one or more features (Paras. 78-79 and 95, Speech recognition system is configured to inter semantic meaning of an audio segment based on various statistical acoustic and language models and grammars such as acoustic models comprise convolutional neural networks (CNN) and recurrent neural networks (RNN) such as long short-term memory (LSTM) neural networks or gated recurrent units (GRU) and deep feed-forward neural networks as to infer probabilities of phonemes in the audio and further used with statistical analysis by language models to create transcripts; furthermore, the system can perform speech feature extraction separately on each segment. Speech segments can be clustered based on the location of their speech feature vectors within a vector space i.e. figure 4 is exemplary of voice activity detection process performed in the method);
transcribing the media based on the extracted one or more features and one or more models (Para. 70, the Automatic Speech Recognition system can adopt a domain-specific language model 29 to generate the transcripts; furthermore, where features extracted from the media are used within the ASR and NLU system, see para. 78);
However, Bradley briefly discusses preferences of the user in using the system, see paras. 130-131; therefore, Bradley fails to explicitly disclose:
notifying a user of the transcription via a device of the user, wherein the notifying is performed according to preferences of the user; and
receiving confirmation of an accuracy of the transcription prior to notifying one or more other users of the transcription.
In a related field of endeavor (generating transcripts, see abstract), Thomson teaches, the notifying of the transcript for the first user via the first device 104, may include a latency or delay with may be set automatically based on knowledge of the first user; furthermore, the latency/delay is determined on the level of hearing impairment of the user, as such, the level and type of hearing impairment may be based on a user profile or preference settings, as taught by Thomson, see para. 153. 
Modifying Bradley to include the features disclosed by Thomson discloses:
notifying a user of the transcription via a device of the user, wherein the notifying is performed according to preferences of the user (e.g. Bradley’s computerized method now also including the feature of notifying a user of the transcription via a device of the user, wherein the notifying is performed according to preferences of the user as taught by Thomson, see para. 153);
It would have been obvious to one of ordinary skill in the art at the time the invention
was filed to apply the teachings of Thomson to the method of Bradley. Doing so would have been predictable to one of ordinary skill in the art given the similar nature between the two
disclosures, for example transcribing media. Further, doing so would have provided the users of Bradley, with the added benefits of presenting i.e. a form of notification, the transcription to a user that may have disabilities such as hearing impairment according to the preferences of the user as taught by Thomson, see para. 153. Furthermore, Thomson describes on para. 93, that “the systems and methods described in some embodiments may be directed to reducing costs to generate transcriptions. Reduction of costs may make transcriptions available to more people. In some embodiments, the systems and methods described in this disclosure may reduce inaccuracy, time, and/or costs by incorporating a fully automatic speech recognition (ASR) system into a transcription system.”
Thomson briefly discusses the quality of the transcription may include an accuracy percentage by the user, see para. 173; however, Bradley in view of Thomson fails to explicitly disclose: 
receiving confirmation of an accuracy of the transcription prior to notifying one or more other users of the transcription.
In a related field of endeavor (e.g. transcription revision, see abstract), Jackson teaches, that the transcript is presented to a reviewer, The reviewer is any person or system that is capable of reviewing text transcribed from audio to confirm the accuracy of the transcription. Furthermore, If errors were made in the audio to text conversion, the reviewer identifies and corrects the errors. The reviewer could be a human reviewer of a previously computer generated speech to text transcript. Alternatively, a hardware and software system that contains the appropriate components to review a speech to text translation and confirm text accuracy is also a reviewer. A reviewer may also include human and non-human components. The final transcription with all speakers identified according to the instructions of the reviewer is then delivered to a client 503 as a finished product, the delivery is done through electronic communication as taught by Jackson, see paras. 35 and 37-38. 
Modifying Bradley to include the features disclosed by Thomson in view of Jackson discloses:
receiving confirmation of an accuracy of the transcription prior to notifying one or more other users of the transcription (e.g. Bradley’s computerized method in view of Thomson now also including the feature of receiving confirmation of an accuracy of the transcription prior to notifying one or more other users of the transcription as taught by Jackson, see paras. 35 and 37-38).
It would have been obvious to one of ordinary skill in the art at the time the invention
was filed to apply the teachings of Jackson to the method of Bradley in view of Thomson. Doing so would have been predictable to one of ordinary skill in the art given the similar nature between the three disclosures, for example transcribing media. Further, doing so would have provided the users of Bradley in view of Thomson, with the added benefits of correctly distinguishing and identifying every speaker in the finished transcript with 100% accuracy as taught by Jackson, see para. 29. Furthermore, a person of ordinary skill in the art would recognize that by having a reviewer, mistakes that are present in the initial transcript may be corrected as described by Jackson, see para. 35.

Regarding claim 4, Bradley in view of Thomson and Jackson teaches the method of claim 1 (see claim 1 above), in addition, Bradley teaches,
wherein the one or more models correlate the one or more features with an appropriate transcription style and appropriately transcribing the media (Para. 70, The system can enable customization of a language model for a specific company, a specific person, a specific service subscriber, or other selection from a group. A domain-specific language model can allow a system to recognize words that are known and frequently used by certain people even when they are unknown or uncommon among a broader range of speakers. A language model can include custom dictionaries of recognizable words and their pronunciations i.e. model correlates features of the speaker with an appropriate transcription style as with speaker indicators in transcription; therefore appropriately transcribing the media according to different speakers as explained in para. 76).

Regarding claim 7, Bradley in view of Thomson and Jackson teaches the method of claim 1 (see claim 1 above), in addition, Bradley teaches further comprising:
determining a transcription style based on the extracted one or more features and one or more models (Para. 97, When a video meeting is conducted, the video stream captured by the embedded cameras can comprise the speaker's head or body images. For example, the system can analyze the images and identify a speaker by his or her facial features that was previously registered or determined in the system i.e. extracted features are images from the video; furthermore, Para. 70, The system can enable customization of a language model for a specific company, a specific person, a specific service subscriber, or other selection from a group. A domain-specific language model can allow a system to recognize words that are known and frequently used by certain people even when they are unknown or uncommon among a broader range of speakers. A language model can include custom dictionaries of recognizable words and their pronunciations i.e. model correlates features of the speaker with an appropriate transcription style as with speaker indicators in transcription i.e. determination of transcription style is differentiating speakers within the media content), wherein the media is transcribed according to the determined transcription style (Para. 146, as shown in FIG. 8C, the system can match the text of the later speaker to align with the text of the earlier speaker according to their recorded timestamps i.e. determination is made of different speakers through models and extracted features; therefore, transcription contains speech correlated to different speakers as seen in fig. 8C with Alice and Bob).

Regarding claim 9, Bradley in view of Thomson and Jackson teaches the method of claim 1 (see claim 1 above), in addition, Bradley teaches wherein:
the user is notified of the transcription along with audio or video of the media (Para. 74, can continuously update the transcript according to the audio streams in real-time. While two or more editors can jointly modify the stored transcripts, the system can, at the same time, push transcript changes to all terminals so that they can display the latest version of the transcripts i.e. transcript appears along with audio stream; furthermore, para. 141 depicts video of the media along with viewing transcription, while replaying the relevant audio segments, the system can also play the corresponding video data on the display 82. This would enable the editor to visually check and confirm the content of the transcript while listening to the audio); and
the transcription notification is synchronized with the audio or video of the media (Para. 153, FIG. 10 shows a GUI for playback of a meeting with multimedia in synchronization with a transcript i.e. transcription appears in sync with audio or video of the media), wherein the synchronization is based on the media's content (Para. 81, the system can provide the transcript 35 for live viewing by meeting participants in real-time i.e. live viewing indicates sync of transcript based on media’s content).

Regarding claim 10, Bradley in view of Thomson and Jackson teaches the method of claim 1 (see claim 1 above), in addition, Bradley teaches,
wherein the transcription includes one or more timestamps (Para. 132, Transcripts can also contain metadata such as tags with timestamps of the beginning of speech segments and possibly the ending of speech segments).

Regarding claim 11, Bradley in view of Thomson and Jackson teaches the method of claim 1 (see claim 1 above), in addition, Bradley teaches,
wherein the transcription is searchable by the user (Para. 127, A document editing application can read the transcript and continuously update the display as new text is combined. Furthermore, some editing applications can provide a search capability to search for text within the transcript).

Regarding claim 13, is directed to a computer program product claim of method claim 1 and is rejected under the same grounds as method claim 1. 
A computer program product for transcribing media, the computer program product (Para. 60, These can be implemented with computers that execute software instructions stored on non-transitory computer-readable media) comprising:
one or more non-transitory computer-readable storage media and program instructions stored on the one or more non-transitory computer-readable storage media capable of performing a method (Para. 60, These can be implemented with computers that execute software instructions stored on non-transitory computer-readable media; furthermore, paras. 223-224 shows an example of non-transitory computer readable medium that stores instructions executed by a computer to perform the steps with the difference being a rotating magnetic disk to flash random access memory).

Regarding claim 16, is directed to a computer program product claim of claim 4 and is rejected under the same grounds as method claim 4.

Regarding claim 17, is directed to a computer system claim of method claim 1 and is rejected under the same grounds as method claim 1.
A computer system for transcribing media, the computer system (Para. 60, These can be implemented with computers that execute software instructions stored on non-transitory computer-readable media)  comprising:
one or more computer processors, one or more computer-readable storage media, and program instructions stored on the one or more of the computer-readable storage media for execution by at least one of the one or more processors capable of performing a method (Para. 60, These can be implemented with computers that execute software instructions stored on non-transitory computer-readable media; furthermore, paras. 223-224 shows an example of non-transitory computer readable medium that stores instructions executed by a computer to perform the steps with the difference being a rotating magnetic disk to flash random access memory).

Regarding claim 20, is directed to a computer system claim of method claim 4 and is rejected under the same grounds as method claim 4.

Claims 5 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Bradley in
view of Thomson and Jackson and further in view of  Steelberg et al. (US Pub. No. 2020/0286485 A1) hereinafter Steelberg.
Regarding claim 5, Bradley in view of Thomson and Jackson teaches the method of claim 1 (see claim 1 above);
However, Bradley in view of Thomson and Jackson fails to explicitly disclose:
 receiving feedback indicative of whether the transcription was accurate; and 
adjusting the one or more models based on the received feedback.
In a related field of endeavor (e.g. transcription revision, see abstract), Steelberg teaches, on Para. 32, on a high level, the transcription method and system with reinforcement learning has the capability to ingest feedback, in the form of a reward function, to generate a revised (improved) transcription based on the received reward function. The revised transcription is then analyzed, and a second reward function is generated as feedback to the transcription engine, which then uses the second reward function to generate yet another revised transcription. This process is repeated until the desired accuracy threshold for the transcription is reached. Furthermore, para. 47 indicates one or more steps 110 through 165 can be considered to be part of a “conductor” which is configured to: train transcription models; select a transcription engine based on a trained model to transcribe the input media file; identify one or more segments of the transcribed media file with a low confidence of accuracy; select a new transcription engine to transcribe the one or more segments with a low confidence of accuracy; develop a new micro training model (e.g., reinforcement learning enabled transcription model) to transcribe one or more segments that cannot be transcribed to a desired level of accuracy by previously selected transcription engines (after several cycles); transcribe the one or more segments using a new micro engine, which is based on the new micro training model i.e. feedback is received from reward system indicative of accuracy of transcription and adjusting the models through a micro training model/engine depending on the received feedback).
Modifying Bradley in view of Thomson and Jackson to include the features disclosed by Steelberg discloses:
receiving feedback indicative of whether the transcription was accurate (e.g. Bradley’s computerized method in view of Thomson and Jackson now also including the feature of receiving feedback indicative of whether the transcription was accurate as taught by Steelberg, see para. 32); and 
adjusting the one or more models based on the received feedback (e.g. Bradley’s computerized method in view of Thomson and Jackson now also including the feature of adjusting the one or more models based on the received feedback as taught by Steelberg, see para. 47).
It would have been obvious to one of ordinary skill in the art at the time the invention
was filed to apply the teachings of Steelberg to the method of Bradley in view of Thomson and Jackson. Doing so would have been predictable to one of ordinary skill in the art given the similar nature between the four disclosures, for example transcribing media. Further, doing so would have provided the users of Bradley in view of Thomson and Jackson, with the added benefits of generating a revised (improved) transcription based on the received reward function i.e. as through the received feedback and adjustment model as taught by Steelberg, see para. 32. Furthermore, Steelberg indicates that through continued use of real data for repeated training may improve its models, see para. 46.

Regarding claim 6, Bradley in view of Thomson and Jackson teaches the method of claim 1 (see claim 1 above);
However, Bradley in view of Thomson and Jackson fail to explicitly disclose:
collecting training data;
extracting training features from the training data; and
training the one or more models based on the extracted training features;
In a related field of endeavor (e.g. transcript revision, see abstract), Steelberg teaches on para. 46, The content of the accumulator may be joined with training data sets at 160 (described further below), which may then be used to further train one or more transcription models at 165. Furthermore, para. 50 teaches each training data set may include data from one or more media files and their corresponding features profiles and transcripts. Each training data set may be a segment of or an entire portion of a large media file. Additionally, each time a media file is ingested and transcribed, it can be added to the training data set i.e. as a media file is ingested, those features are extracted and are included as feature profiles as training data. Moreover, para. 50 describes Training module 200 may train one or more transcription models to improve an engine or to optimize the selection of engines using one or more training data sets from training database 215. Training module 200, shown with training modules 200-1 and 200-2, may train a transcription model using multiple, e.g., thousands or millions, of training data sets. Each training data set may include data from one or more media files and their corresponding features profiles and transcripts.
Modifying Bradley in view of Thomson and Jackson to include the features disclosed by Steelberg discloses:
collecting training data (e.g. Bradley’s computerized method in view of Thomson and Jackson now also including the feature of collecting training data as taught by Steelberg, see para. 46);
extracting training features from the training data (e.g. Bradley’s computerized method in view of Thomson and Jackson now also including the feature of extracting training features from the training data as taught by Steelberg, see para. 50); and
training the one or more models based on the extracted training features (e.g. Bradley’s computerized method in view of Thomson and Jackson now also including the feature of training the one or more models based on the extracted training features as taught by Steelberg, see para. 50);
It would have been obvious to one of ordinary skill in the art at the time the invention
was filed to apply the teachings of Steelberg to the method of Bradley in view of Thomson and Jackson. Doing so would have been predictable to one of ordinary skill in the art given the similar nature between the four disclosures, for example transcribing media. Further, doing so would have provided the users of Bradley in view of Thomson and Jackson, with the added benefits of generating a revised (improved) transcription based on the received reward function i.e. as through the received feedback and adjustment model as taught by Steelberg, see para. 32. Furthermore, Steelberg indicates that through continued use of real data for repeated training may improve its models, see para. 46. Furthermore, Steelberg teaches that by training one or more transcription models it improves an engine or to optimize the selection of engines using one or more training data sets from training database 215, see para. 50. 

Claims 8 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Bradley in
view of Thomson and Jackson and further in view of Talieh et al. (US Pat. No. 11,315,569 B1) hereinafter Talieh.
Regarding claim 8, Bradley in view of Thomson and Jackson teaches the method of claim 7 (see claim 7 above), in addition, Bradley teaches,
wherein the transcription style is selected from a group comprising a transcription (Para. 175, a selection of the hippopotamuses sample image 112 in the grid of FIG. 11A. After receiving a selection of the sample image 112, the system can display the corresponding text transcript 114 i.e. selection is made from various configuration comprising a transcription), outline (Para. 146, as shown in FIG. 8C, the system can match the text of the later speaker to align with the text of the earlier speaker according to their recorded timestamps i.e. outline of speakers with related transcription), presentation with notes (Para. 166, lower threshold for a slide-view of a presentation i.e. as with Fig 11A in which presentation with slides are provided with associated notes of corresponding animals which may correspond to lecture notes or the transcription itself as an application of use recognized in para. 3), blog with comments (Para. 117, FIG. 7, the display 70 can show a speaker name at the beginning of the transcribed text of the speaker. As such, the speaker name can be an indicator to mark a speaker. According to some embodiments, other indicators, such as avatars and user IDs, can also be adopted; furthermore, para. 124 indicates that transcriptions may contain comments of editors i.e. structure of various speakers is written as a blog with comments included see figure 7 specifically text related to speakers and comments such as “I’ll handle this” from Porter), and tutorial with examples (Para. 166, lower threshold for a slide-view of a presentation i.e. as with Fig 11A in which presentation with slides are provided with associated notes of corresponding animals which may correspond to lecture notes (lecture may be a tutorial as it is teaching) or the transcription itself as an application of use recognized in para. 3 where examples may be images that are used to enrich).
In a related field of endeavor (e.g. generating a transcription of a media), Talieh teaches, Generative summary—In some embodiments, analytics subsystem 118 may provide a short, automatically generated abstract or summary of each transcript, condensing the gist of the information about a meeting. For example, the summary can be “This meeting was about discussing tasks to reach our next milestone on project A”, see lines 45-50 on col. 9.
Modifying Bradley in view of Thomson and Jackson to include the features disclosed by Talieh discloses:
wherein the transcription style is selected from a group comprising a transcription, outline, summary, presentation with notes, blog with comments, and tutorial with examples (e.g. Bradley’s computerized method in view of Thomson and Jackson now also including summary within the transcription styles as taught by Talieh, see lines 45-50 on col. 9).
It would have been obvious to one of ordinary skill in the art at the time the invention
was filed to apply the teachings of Talieh to the method of Bradley in view of Thomson and Jackson. Doing so would have been predictable to one of ordinary skill in the art given the similar nature between the four disclosures, for example transcribing from media. Further, doing so would have provided the users of Bradley in view of Thomson and Jackson, with the added benefits of the analytics subsystem 118 may not display all information, and not everything will be explicitly shown to speakers. Some of this knowledge could be used to make informed decisions for enhancing user experience and displaying transcripts in a better way which maximizes productivity. Some information can be highlighted (e.g., key phrases, named entities) to help users identify important parts of transcripts. In some embodiments, based on the action items and intents determined by analytics subsystem 118, collaboration subsystem 112 can create tasks or tickets in project management tools, saving time for users, see lines 51-65 on col. 9.

Regarding claim 12, Bradley in view of Thomson and Jackson teaches the method of claim 1 (see claim 1 above), in addition, Bradley teaches,
wherein the one or more features include frequency (Para. 188, filter banks may be applied to determine values for one or more frequency domain features, such as Mel-Frequency Cepstral Coefficients i.e. features extracted from the audio), vocabulary (Para. 64, A networked server can perform higher accuracy ASR using larger models, large vocabulary, organization-specific vocabulary, custom phrase replacement, natural language grammar processing, of some combination of such features and techniques i.e. language model is used to correlated with vocabulary features extracted from media), facial expressions (Para. 97, the system can analyze the user's facial movement, such as mouth movement i.e. facial movements are particular expressions), 
However, Bradley in view of Thomson and Jackson fails to explicitly disclose:
wherein the one or more features include topics, importance, tones, moods, pointing, waving, eye direction, and eye movement.
In a related field of endeavor (e.g. generating a transcription of a media), Talieh teaches, feature extraction subsystem 116 processes an audio recording (e.g., first audio recording 204a or other audio recordings) or a transcript (e.g., first speaker-specific transcript 214a, meeting transcript 216, or other transcript) to extract multiple features associated with the meeting. A feature describes or indicates a characteristic of the meeting. The features can include vocabulary, semantic information of conversations, summarization of a call, voice signal associated features (e.g., a speech rate, a speech volume, a tone, and a timber), emotions of speakers (e.g., fear, anger, happiness, timidity, fatigue), personal attributes of speakers (e.g., an age, an accent, and a gender), non-aural features (e.g., visual features such as body language or facial expressions of the speaker i.e. including movements and directions of eye and body language with pointing or waving), or any other features. The features can also include subject matter related features such as a subject of the meeting, an industry or technology area related to the meeting, a product or service discussed during the meeting, or other features, see lines 14-31 on col. 8.
Modifying Bradley in view of Thomas and Jackson to include the features disclosed by Talieh discloses:
wherein the one or more features include topics, importance, tones, moods, pointing, waving, eye direction, and eye movement (e.g. Bradley’s computerized method in view of Thomson and Jackson now also including the feature wherein the one or more features include topics, importance, tones, moods, pointing, waving, eye direction, and eye movement as taught by Talieh, see lines 14-31 on col. 8).
It would have been obvious to one of ordinary skill in the art at the time the invention
was filed to apply the teachings of Talieh to the method of Bradley in view of Thomson and Jackson. Doing so would have been predictable to one of ordinary skill in the art given the similar nature between the four disclosures, for example transcribing from media. Further, doing so would have provided the users of Bradley in view of Thomson and Jackson, with the added benefits of the analytics subsystem 118 may not display all information, and not everything will be explicitly shown to speakers. Some of this knowledge could be used to make informed decisions for enhancing user experience and displaying transcripts in a better way which maximizes productivity. Some information can be highlighted (e.g., key phrases, named entities) to help users identify important parts of transcripts. In some embodiments, based on the action items and intents determined by analytics subsystem 118, collaboration subsystem 112 can create tasks or tickets in project management tools, saving time for users, see lines 51-65 on col. 9. Furthermore, analytics subsystem 118 processes the features to determine various analytics that can provide different types of information regarding the meeting or speakers as recognized by Talieh, see lines 38-43 on col. 8 i.e. information may gathered on following features to determine information regarding the meeting or speakers.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s
disclosure.
Danilo et al. (US 2021/0050000 A1) teaches, a system for generating a personality
assessment that uses multimodal behavioral signal processing technology and machine learning prediction technology. This system takes a video as input, processes it through an artificial intelligence software built for extracting hundreds of behavioral features, and consequently generates an accurate and reliable personality assessment with its machine-learning predictive software. The personality assessment is based on the five-factor model (FFM), also known as the big 5 personality traits.

Taple et al. (US 2018/0315429 A1) teaches, Techniques for accurately recording
sworn deposition testimony without use of a court reporter are described herein. According to these techniques, participants in a deposition or other legal proceeding are identified in such a manner that speech in one or more audio files representing the deposition can be associated with the respective participants. The association of participants with recorded speech is used to automatically generate an accurate transcript sequentially reflecting what was said at the deposition proceeding and by which of the respective participants.
Raanani et al. (US 2018/0046710 A1) teaches, automatically generating a playlist of
conversations having a specified moment. A moment can be occurrence of a specific event or a specific characteristic in a conversation, or any event that is of specific interest for an application for which the playlist is being generated. For example, a moment can include laughter, fast-talking, objections, response to questions, a discussion on a particular topic such as budget, behavior of a speaker, intent to buy, etc., in a conversation. A moment identification system analyzes each of the conversations to determine if one or more features of a conversation correspond to a specified moment, and includes those of the conversations in the playlist having one or more features that correspond to the specified moment. The playlist may include a portion of a conversation that has the specified moment rather than the entire conversation.

Yoshioka et al. (US 2020/0349950 A1) teaches A computer implemented method
processes audio streams recorded during a meeting by a plurality of distributed devices. Operations include performing speech recognition on each audio stream by a corresponding speech recognition system to generate utterance-level posterior probabilities as hypotheses for each audio stream, aligning the hypotheses and formatting them as word confusion networks with associated word-level posteriors probabilities, performing speaker recognition on each audio stream by a speaker identification algorithm that generates a stream of speaker-attributed word hypotheses, formatting speaker hypotheses with associated speaker label posterior probabilities and speaker-attributed hypotheses for each audio stream as a speaker confusion network, aligning the word and speaker confusion networks from all audio streams to each other to merge the posterior probabilities and align word and speaker labels, and creating a best speaker-attributed word transcript by selecting the sequence of word and speaker labels with the highest posterior probabilities, specifically, para. 122 indicates, analysis of video data may indicate an eye gaze or track eye movements to infer where a user is looking. Eye gaze analysis may result in control commands for the AI application, and may differ based on fusion with audio data.

Applicant's amendment necessitated the new ground(s) of rejection presented in this
Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the
examiner should be directed to JONATHAN E AMAYA HERNANDEZ whose telephone number is (571)272-2484. The examiner can normally be reached Monday - Friday 9:30 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/J.E.A./             Examiner, Art Unit 2655   

/JONATHAN C KIM/             Primary Examiner, Art Unit 2655