DETAILED ACTION
Applicant’s argument filed in reply on 4/26/2022 were received and fully considered. Claims 1, 12, 19, and 20 were amended. Claim 14 was cancelled. Claims 15 - 17 contained allowable subject matter as identified in the previous Office Action. Therefore, claims 1- 13, 18 - - 20 are examined. Please see below for more detail.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement(s)(IDS) submitted on 5/17/2022, has been considered by the examiner.

Response to Arguments
Applicant’s arguments filed in the Amendment filed 4/26/2022 (herein “Amendment”) with respect to the 35 U.S.C. §101 rejection raised in the previous office action have been fully considered, and they are persuasive. Therefore, the rejection of claim 20 under 35 U.S.C. §101 is withdrawn.
Applicant’s arguments and amendments in the Amendment with respect to the 35 U.S.C. §112(a) claim objection raised in the previous office action have been fully considered, and they are persuasive. Therefore, the claim objection of claim 20 under 35 U.S.C. §112(a) is withdrawn.
Applicant’s arguments and amendments in the Amendment with respect to the prior art rejections for claim 1, 19 and 20, and therefore the claims depending therefrom, under 35 U.S.C. 103 over Joller, and in view of Helmbro raised in the previous office action have been considered, but are persuasive only to the extent that the amendments have changed the broadest reasonable interpretation, thus necessitating a new ground of rejection in view of newly cited Ramamurthy (US20200311122A1). 
Specifically, Applicant’s amendments include partial inclusion of an allowable subject matter as identified in the previous Office Action (from canceled claim 14). While the amended language sought to overcome the 35 USC §103 rejection has been fully considered, they are not persuasive, thus necessitating the new ground of rejection.
Please see prior art section below for more detail including updated citations and obviousness rationale.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 1, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Joller (US20200105274A1), Ramamurthy et al. (US20200311122A1)(hereinafter " Ramamurthy"),  and Helmbro (US20200302112A1).

Joller, and Helmbro were applied in the previous Office Action.
Regarding claims 1, 19, and 20, Joller teaches a data processing system, method and computer program product comprising a computing system for automatically processing electronic content and for generating corresponding output, the computing system comprises: (Joller, Par. 0003:” The present disclosure relates generally to systems and methods for processing content. More specifically, but not exclusively, the present disclosure relates to systems and methods for analyzing and processing audio content to generate shorter-form audio content and/or associated content information and/or to improve content search operations.”).
one or more processors; and one or more computer readable hardware storage devices having stored computer-executable instructions that are executable by the one or more processors to cause the computing system to at least: (Joller, Par. 0113:” In certain embodiments, the content processing system 802, the content generation system 800, and/or the querying system 804 may comprise at least one processor system configured to execute instructions stored on an associated non-transitory computer-readable storage medium. For example, the content processing system 804 may include excitable instructions configured to implement content processing 808 and/or content query processing 810 methods consistent with the disclosed embodiments. The content processing system 802, the content generation system 800, and/or the querying system 804 may further comprise software and/or hardware configured to enable electronic communication of information between the devices and/or systems 800-804 via the network using any suitable communication technology and/or standard.”).
identify electronic content associated with a meeting, the electronic content including audio speech; (Joller, Par. 0007:” Audio content that may be processed, analyzed, and/or structured in connection with aspects of the disclosed embodiments may comprise a variety of types of audio content including, for example and without limitation, one or more of podcasts, radio shows and/or broadcasts, interviews, phone and/or meeting recordings, and/or the like. In addition, although various embodiments are described in connection with processing, analyzing, and/or structuring audio content, it will be appreciated that aspects of the disclosed systems and methods may be used in connection with a variety of content types and/or formats.”).
create a transcription of the audio speech with an automatic speech recognition (ASR) model trained on speech-to-text training data, the transcription being a text-based transcription; (Joller, Par. 0011:” ... a method of processing audio content may include receiving a first audio content file and generating, based on the first audio content file, a text file comprising transcribed text corresponding to the first audio content file using any suitable audio-to-text transcription process(es). One or more words, multiple word phrases, entities, and/or the like may be extracted from the generated text file, providing parameters for analyzing the content of the file, identifying the most relevant and/or interesting segments, enhancing searchability of the file, and/or the like.", and Par.  0035:" Audio-to-text transcription 110 methods may be used to transcribe the longer-form content 102-108 into associated text. For example, in various embodiments, automatic speech recognition [“ASR”] methods may be used to transcribe audio content into associated text. Diarization 112 may be used to identify and/or otherwise label distinct speakers in the longer-form content 102-108. For example, diarization 112 may label speakers in transcribed audio text with distinct speakers labels independent of a specific name and/or identity. In addition, speaker identification may be used to label distinct speakers with specific identities and/or names.")
perform post-processing on the transcription, generating a post- processed transcription, by using a machine learning model trained on post-processing training data for modifying text included in the transcription, (Joller, Par. 0052:” A set of rules and/or filters may be applied to eliminate less informative, less relevant, and/or otherwise noisy candidate tags. In certain embodiments, tags may be filtered using a variety of techniques and/or based on a variety of criteria. For example, in some embodiments, candidate tags may be filtered by identifying whether a candidate tag includes a certain part-of-speech and/or character pattern(s), whether a candidate tag is included in a defined list of uninformative, less relevant, and/or noisy tags [e.g., a black list] and/or a controlled vocabulary, whether a candidate tag is semantically related to surrounding context, whether a candidate tag is used at a threshold or below a threshold level of frequency, and/or the like. Candidate tags included in less relevant content sections and/or segments such as, for example and without limitation, advertisements, non-substantive dialogues, announcements, and/or the like, may also be filtered.”, and Par. 0101:” The punctuated text 710 and/or labeled audio segments 718 may be processed as part of an annotation process 720 to generate annotated text 722. In some embodiments, the annotation process 720 may comprise, for example, annotation of the text with speakers and/or segments corresponding to diarized speech segments. The annotated text 722 may be post-processed 724 and/or otherwise filtered to remove less informative and/or less relevant content such as, for example and without limitation, advertisements, introductions, pauses, conclusions, and/or the like.”, and Par. 0006:"In some embodiments, artificial intelligence methods may be used in connection with content processing, analyzing, and/or structuring including, without limitation, one or more machine learning and/or natural language processing techniques.").
generate output based from the post-processed transcription, the output comprising at least one of a meeting summary generated by a machine learning summarization model that summarizes content of the post-processed transcription [[by at least breaking the post-processed transcription into a plurality of turns corresponding to a plurality of speakers, each turn being based on a role vector of a speaker corresponding to the turn, and wherein the summarization model selectively applies rules during analysis of each  turn, with each of the rules being selectively applied based on one or more corresponding roles from which the role vector is determined, or a template that is generated at least in part from the post-processed transcription, the template comprising a meeting template that is automatically selected from a plurality of different templates based on a meeting type that is determined from analyzing the post-processed transcript and which is automatically populated with content from the post-processed transcript]]. (Joller, Par. 0103:” Individual phrases, sentences, and/or segments may be scored 736 [e.g., ranked in order of importance, relevance, and/or interest in the context of the long form audio content 702] and provided to content and/or segment summarization processes 738 to generate one or more cohesive shorter-form summaries 740. Scoring information may be further provided to content search indexing processes 744 as described herein.”, and Par. 0042:” Certain embodiments may employ artificial intelligence methods including, for example, machine learning and/or natural language processing techniques, to enhance the searchability of content, extract metadata [e.g., keywords, key phrases, named entities, topics, and/or the like], and/or the generation of shorter-form content [e.g., segments, summaries, highlights, and/or the like].”).
Joller fails to explicitly disclose, however, Ramamurthy teaches [[(i) a meeting summary generated by a machine learning summarization model that summarizes content of the post-processed transcription]] by at least breaking the post-processed transcription into a plurality of turns corresponding to a plurality of speakers, each turn being based on a role vector of a speaker corresponding to the turn, and wherein the summarization model selectively applies rules during analysis of each  turn, with each of the rules being selectively applied based on one or more corresponding roles from which the role vector is determined, (Ramamurthy, Par. 0041:” Rules engine 228 applies one or more rules 230 configured by an administrator or one or more users to determine relevancy of meeting item summaries 231 for users. …  Rules 230 may specify relevance for meeting item summaries 231 by positively specifying that a characteristic of a meeting item summary is relevant to particular users, or by negatively specifying that a characteristic of a meeting item summary is not relevant to particular users. A negative rule reduces the likelihood that rules engine 228 will cause summary personalization unit 110 to include a meeting item summary meeting the condition of the negative rule in a personalized meeting summary.”, and Par. 0044:” For example, one rule 230 may be based on the content of the meeting items. In this case, rules engine 228 may analyze meeting item summaries 231 for certain terms or phrases and these terms or phrases may be associated with certain participants [i.e., turns] according to their roles. As an example, terms or phrases that are related to finance may be associated with the financial officer or the person responsible for the finances.”, and Par. 0046 For example, rules 230 may be set up based on the meeting type. … Meeting types may include but not be limited to daily standup meetings, all-hands meetings, sales meetings, one-on-one meetings or cross-company meetings [where more than one company is involved]. … Within these presets, rules 230 describing the relevance of the information to various roles or participants may be defined.”). Note, when indicating certain terms or phrases and these terms or phrases may be associated with certain participants, it implies turns, based on the given speaker.
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller in view of Ramamurthy to by at least breaking the post-processed transcription into a plurality of turns corresponding to a plurality of speakers, each turn being based on a role vector of a speaker corresponding to the turn, and wherein the summarization model selectively applies rules during analysis of each  turn, with each of the rules being selectively applied based on one or more corresponding roles from which the role vector is determined, in order to automatically generate personalized summaries of a meeting, where each personalized summary is tailored to the interest and responsibilities of a user or application for which the personalized summary has been generated, as evidence by Ramamurthy (See Par. 0005).
Joller and Ramamurthy fail to explicitly disclose, however, Helmbro teaches wherein the post- processing includes both (1) modifying at least one of a punctuation, grammar or formatting of the transcription that was introduced by the ASR model and (2) changing or omitting one or more words in the transcription which were included in both the audio speech and the transcription. (Helmbro, 0042:” Optionally at block 92, text corresponding to filler words is omitted. Filler words may be predefined words, such as, for example, “umm”, “ah”, “mmm”, or any other word that is typically used to bridge a gap in time as a user speaks. As will be appreciated, the filler words may vary based on language and/or dialect [e.g., filler words for the English language may be different from filler words in the Japanese language]. As filler words are identified in the selected text, they are automatically removed.”, and Par. 0047:” Optionally at block 102, a summarizing algorithm may be applied to the text in order to reduce the number of words in the text container. For example, certain words, such as “a”, “the”, etc. may be omitted in order to reduce the length of the audio clip without altering the message conveyed by the audio clip. Such summarizing algorithms are known in the art and therefore not described herein.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller and Ramamurthy in view of Helmbro to employ wherein the post- processing includes both (1) modifying at least one of a punctuation, grammar or formatting of the transcription that was introduced by the ASR model and (2) changing or omitting one or more words in the transcription which were included in both the audio speech and the transcription, in order to provide a simple user interface for editing the audio file, as evidence by Helmbro (See Par. 0036).


Claims 2, and 3 are rejected under 35 U.S.C. 103 as being unpatentable over Joller, Ramamurthy, and Helmbro, as applied to claim 1, and in further view of  Crigler (US20070288518A1).

Crigler was applied in the previous Office Action.
Regarding claim 2, Joller, Ramamurthy and Helmbro fail to explicitly disclose, however, Crigler teaches wherein the transcription includes a plurality of links corresponding to tags associated with the electronic content and wherein the computer-executable instructions are further executable by the one or more processors to cause the computing system to generate the tags from the electronic content. (Crigler, Par. 0101:” Content Enhancement components 132, 134, 136 may be used individually, in sequence, or in parallel to identify the audio track of a newscast and to provide a transcript of the news cast as metadata associated with the content comprising the original newscast. For example, in various implementations, outputs from a plurality of similar Content Enhancement components 132, 134, 136 may be compared to produce metadata that describes the content in the aggregate. An example of such an implementation may include the use of a plurality of speech recognition components from different vendors, with the outputs of each Content Enhancement component processed in relation to the output of other components. The output of the Content Enhancement components 132, 134, 136 may be provided to MMS 114 for association with original content in order to improve the final results stored in MMS 114. [0149] The static page contains a preview image, title, description, first couple of paragraphs of the transcript or STT, a set of keyword metadata encoded as meta tags, and a list of links to related pieces of content. The static pages may also have links other static pages that allow the search engines to follow the links in order to crawl all of our static pages. The MMS 114 provides the pieces of content, associated content, metadata, and tags for each piece of content.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller, Ramamurthy and Helmbro in view of Crigler to employ wherein the transcription includes a plurality of links corresponding to tags associated with the electronic content and wherein the computer-executable instructions are further executable by the one or more processors to cause the computing system to generate the tags from the electronic content, in order to improve search engine ranking, as evidence by Crigler (see Par. 0153).

Regarding claim 3, Joller teaches the computing system of claim 2, wherein the plurality of links point to data related to the electronic content, but wherein the data related to the electronic content is external to the electronic content. (Joller, Par. 0056:” In various embodiments, entities, keywords, and/or key phrases may be mapped/linked to unique identities in one or more knowledge bases and/or other available information repositories. In some embodiments, mapping keywords, key phrases, and/or entities to identities may disambiguate their meaning and/or otherwise enrich them with relationships, properties, metadata, and/or other information. In certain embodiments, keyword, key phrase, and/or entity linking may use context and/or other features from the content where entities, keywords, and/or key phrases occur and compare them to the context and/or other features from external sources where the knowledge base identities occur. Linking determinations may, in some implementations, use machine learning models and/or algorithms.”).

Claim 4, is rejected under 35 U.S.C. 103 as being unpatentable over Joller, Ramamurthy, Helmbro, and Crigler as applied to claim 2, and in further view of  Georges (US20190266240A1).

Georges was applied in the previous Office Action.
Regarding claim 4, Joller, Ramamurthy, Helmbro, and Crigler fail to explicitly disclose, however, Georges teaches wherein the computing system uses a machine learning speech tagging model to generate the tags, the machine learning speech tagging model generating at least one tag in response to identifying a spoken starting keyword and a spoken ending keyword in the audio speech, and (Georges, Par. 0020:” With reference to FIG. 1, an embodiment of an electronic system 10 may include memory 12 to store an electronic representation of an audio stream, a processor 11 coupled to the memory 12, and logic 13 coupled to the processor 11 and the memory 12. The logic 13 may be configured to detect a phrase in the audio stream based on a pre-defined vocabulary, associate a time stamp with the detected phrase, and classify a spoken intent based on a sequence of detected phrases and the respective associated time stamps. For example, the logic 13 may be further configured to monitor a continuous audio stream, detect the phrase in the continuous audio stream, and compute a quantized time stamp for the detected phrase which is relative to previously detected phrase. In some embodiments, the logic 13 may include a first neural network with an acoustic model and a hidden Markov model [HMM] to detect the phrase in the audio stream. For example, the acoustic model may be configured to automatically add time stamp information to text data for the detected phrase.”).
wherein the generating of the at least one tag includes classifying content of the audio speech as a particular note type, selected from a plurality of note types, based on the content which occurs between the starting keyword and the ending (Georges, Par. 0021:” In some embodiments, the logic 13 may additionally, or alternatively, include a second neural network trained to return a probability for each of two or more intent classifications based on detected phrases and time stamps as input features to the second neural network. For example, the logic 13 may be further configured to classify the spoken intent in accordance with a highest probability of the two or more intent classifications. In some embodiments, the logic 13 may be configured to asynchronously trigger the second neural network when a sequence of detected phrases is ready for classification.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller, Ramamurthy, Helmbro, and Crigler in view of Georges to employ wherein the computing system uses a machine learning speech tagging model to generate the tags, the machine learning speech tagging model generating at least one tag in response to identifying a spoken starting keyword and a spoken ending keyword in the audio speech, and wherein the generating of the at least one tag includes classifying content of the audio speech as a particular note type, selected from a plurality of note types, based on the content which occurs between the starting keyword and the ending, in order to improve the intent detection accuracy, as evidence by Georges (See Par.  0056).


Claims 5, and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Joller, Ramamurthy, Helmbro, Crigler, and Georges as applied to claim 4, and in further view of  De (US8688447B1).

De was applied in the previous Office Action.
Regarding claim 5, Joller, Ramamurthy, Helmbro, Crigler and Georges fail to explicitly disclose, however, De teaches wherein the at least one tag comprises an action item note type that identifies one or more tasks and one or more entities associated with the one or more tasks. (De, Col. 5, lines 36 – 43:” As shown in FIG. 2D, in one embodiment of the invention, each action [214] is associated with one or more transcription-level tags [206]. Actions may be performed using entities [200] that are related to transcription-level tags. Following the above example, the transcription-level tag BOOK-HOTEL may be associated with actions such as booking a reservation with a hotel using entities tagged by HOTEL-NAME and CITY.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller, Ramamurthy, Helmbro, Crigler, and Georges in view of De to employ wherein the at least one tag comprises an action item note type that identifies one or more tasks and one or more entities associated with the one or more tasks, in order to provide information on how the tagged entities relate to each other as a whole, as evidence by De (See Col. 5, lines 12-14).

Regarding claim 6, Joller, Ramamurthy, Helmbro, Georges and De fail to explicitly disclose, however, Crigler teaches wherein the at least one tag further includes links to one or more of an assigning party, a responsible party, a deadline, a content, or a priority level associated with the task. (Crigler, Par. 0149:” The static page contains a preview image, title, description, first couple of paragraphs of the transcript or STT, a set of keyword metadata encoded as meta tags, and a list of links to related pieces of content. The static pages may also have links other static pages that allow the search engines to follow the links in order to crawl all of our static pages. The MMS 114 provides the pieces of content, associated content, metadata, and tags for each piece of content.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller, Ramamurthy, Helmbro, Georges and De in view of Crigler to employ wherein the at least one tag further includes links to one or more of an assigning party, a responsible party, a deadline, a content, or a priority level associated with the task, in order to improve search engine ranking, as evidence by Crigler (see Par. 0153).

Claims 7, 8, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Joller, Ramamurthy, and Helmbro, as applied to claims 1, 7, and 1 respectively, and in further view of  Chen (US20200394258A1).

Regarding claim 7, Joller, Ramamurthy, and Helmbro fail to explicitly disclose, however, Chen teaches wherein the readability of the transcription is modified when generating the post-processed transcription by converting a spoken language style of the audio speech to a written language style. (Chen, 0020:” In the SR-NLU system presently disclosed herein, a transcription is generated from an ASR subsystem. The transcription may then be processed to refine or “edit” the transcription to replace certain tokens within the transcription. The replacements may serve to, for example, remove vulgar words or expressions from a transcription, correct formatting of numbers or other terms, and correct titles and names of people or places. As such, references to an “edited” transcription are intended refer to a transcription that has been modified by replacing certain words or tokens in the transcription to create a more refined or improved transcription.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller, Ramamurthy, and Helmbro in view of Chen to employ wherein the readability of the transcription is modified when generating the post-processed transcription by converting a spoken language style of the audio speech to a written language style, in order to provide a good user experience, as evidence by Chen (see Par. 0003).

Regarding claim 8, Joller, Ramamurthy, and Helmbro fail to explicitly disclose, however, Chen teaches wherein the readability of the transcription is modified by determining a level of readability of individual words and phrases of the transcription and at least (1) removing words corresponding to a low level of readability, or (2) substituting words corresponding to a low level of readability with words corresponding to an increased level of readability, (Chen, 0020:” In the SR-NLU system presently disclosed herein, a transcription is generated from an ASR subsystem. The transcription may then be processed to refine or “edit” the transcription to replace certain tokens within the transcription. The replacements may serve to, for example, remove vulgar words or expressions from a transcription, correct formatting of numbers or other terms, and correct titles and names of people or places. As such, references to an “edited” transcription are intended refer to a transcription that has been modified by replacing certain words or tokens in the transcription to create a more refined or improved transcription.”).
wherein the determining the level of readability is based on the individual words and phrases contributing to a semantic meaning and/or desired style inferred from the transcription.   (Chen Par. 0077:” In another implementation, the replacement token is an abbreviation of the token of interest; the token of interest is a textual expression of a number and the replacement token is the number; the token of interest has a vulgar meaning and the replacement token is a polite synonym of the token of interest; the token of interest is a loan word and the replacement token is a synonym native to a language of the speech audio; or the replacement token has a same pronunciation as the token of interest and a more proper written form than the token of interest in the identified natural language domain.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller, Ramamurthy, and Helmbro in view of Chen to employ wherein the readability of the transcription is modified by determining a level of readability of individual words and phrases of the transcription and at least (1) removing words corresponding to a low level of readability, or (2) substituting words corresponding to a low level of readability with words corresponding to an increased level of readability, wherein the determining the level of readability is based on the individual words and phrases contributing to a semantic meaning and/or desired style inferred from the transcription, in order to provide a good user experience, as evidence by Chen (see Par. 0003).


Regarding claim 18, Joller, Ramamurthy, and Helmbro fail to explicitly disclose, however, Chen teaches wherein the output generated from the post-processed transcript is further post-processed remove errors and (Chen, 0020:” In the SR-NLU system presently disclosed herein, a transcription is generated from an ASR subsystem. The transcription may then be processed to refine or “edit” the transcription to replace certain tokens within the transcription. The replacements may serve to, for example, remove vulgar words or expressions from a transcription, correct formatting of numbers or other terms, and correct titles and names of people or places. As such, references to an “edited” transcription are intended refer to a transcription that has been modified by replacing certain words or tokens in the transcription to create a more refined or improved transcription.”).
modify text to improve the readability and accuracy of the output.  (Chen, Par. 0026:” As another example, when a user says “when is the pink concert”, the SR-NLU system should understand that the user is asking when is the concert of the singer “P!nk”, so a more appropriate transcription to be rendered to the user should be “when is the P!nk concert” instead of “when is the pink concert”. This means that the “pink” is to be replaced by the “P!nk” which is a more proper written form of the singer's name and has the same pronunciation as that of the “pink”. Likewise, the replacement from the “pink” to the “P!nk” is specific to the Music domain. It is obviously not proper to implement such a replacement in other domains such as a Geography domain to answer the query “show me a picture of the pink poodle motel”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller, Ramamurthy, and Helmbro in view of Chen to employ wherein the output generated from the post-processed transcript is further post-processed remove errors and modify text to improve the readability and accuracy of the output, in order to provide a good user experience, as evidence by Chen (see Par. 0003).

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Joller, Ramamurthy, and Helmbro, as applied to claim 1, and in further view of Chen2 (US20210350786A1).

Chen2 was applied in the previous Office Action.
Regarding claim 9, Joller, Ramamurthy, and Helmbro fail to explicitly disclose, however, Chen2 teaches wherein the post-processing training data is created by: identifying ungrammatical sentences comprising text; (Chen2, Par. 0003:” The method includes obtaining, by data processing hardware, a plurality of training text utterances, wherein a first portion of the plurality of training text utterances includes unspoken text utterances and a remaining second portion of the plurality of training text utterances comprises transcriptions in a set of spoken training utterances [ungrammatical]. Each unspoken text utterance is not paired with any corresponding spoken utterance.”).
generating text-to-speech (TTS) data from the text; (Chen2, Par. 0003:” Each spoken training utterance comprising a corresponding transcription paired with a corresponding non-synthetic speech representation of the corresponding spoken training utterance. For each of a plurality of output steps for each training text utterance of the plurality of training text utterances, the method also includes: generating, by the data processing hardware, for output by the GAN-based TTS model, a synthetic speech representation of the corresponding training text utterance, and determining, by the data processing hardware, using an adversarial discriminator of the GAN, an adversarial loss term indicative of an amount of acoustic noise disparity in one of the non-synthetic speech representations selected from the set of spoken training utterances relative to the corresponding synthetic speech representation of the corresponding training text utterance.”).
transcribing the TTS data using an automatic speech recognition model; and (Chen2, Par. 0003:” The method also includes training, by the data processing hardware, the speech recognition model on the synthetic speech representation generated at each of the plurality of output steps for each training text utterance of the plurality of training text utterances.”).
pairing the transcribed TTS data with the corresponding ungrammatical sentences. (Chen2, Par. 0007:” In some examples, at each of the plurality of output steps for each training text utterance of the plurality of training text utterances, the one of the non-synthetic speech representations selected from the set of spoken training utterances includes: a randomly selected non-synthetic speech representation from the set of spoken training utterances when the corresponding training text utterance comprises one of the unspoken text utterances in the first portion of the plurality of training text utterances; or a non-synthetic speech representation  from the set of spoken training utterances that is paired with the corresponding one of the transcriptions when the corresponding training text utterance comprises one of the transcriptions in the second portion of the plurality of training text utterances.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller, Ramamurthy, and Helmbro in view of Chen2 to employ wherein the post-processing training data is created by: identifying ungrammatical sentences comprising text; generating text-to-speech (TTS) data from the text; transcribing the TTS data using an automatic speech recognition model; and pairing the transcribed TTS data with the corresponding ungrammatical sentences, in order to improve the accuracy of the ASR model, as evidence by Chen2 (See Par. 0030).


Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Joller, Ramamurthy, and Helmbro, as applied to claim 1, and in further view of John Niekrasz (US20190327103A1).

Niekrasz was applied in the previous Office Action.
Regarding claim 10, Joller, Ramamurthy, and Helmbro fail to explicitly disclose, however, Niekrasz teaches wherein the output comprises the meeting summary which is automatically generated based on abstractive summarization of the post-processed transcription. (Niekrasz, Par. 0008:” In some examples, this disclosure describes a computing system for automatically generating abstractive summaries of meetings, the computing system comprising: a memory configured to store a transcript of a meeting”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller, Ramamurthy, and Helmbro in view of Niekrasz to employ wherein the output comprises the meeting summary which is automatically generated based on abstractive summarization of the post-processed transcription, in order to provide technical improvements to voice recognition and dictation-related technologies that provide at least one practical application, as evidence by Niekrasz (See Par. 0005).

Claims 11, 12, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Joller, Ramamurthy, and Helmbro, as applied to claim 10, 11, and 12 respectively, and in further view of Karn (US20200285663A1).

Karn was applied in the previous Office Action.
Regarding claim 11, Joller, Ramamurthy, and Helmbro fail to explicitly disclose, however, Karn teaches wherein the abstractive summarization is performed by a summarization model configured as a multi-level encoding-decoding neural network with attention. (Karn, Par. 0026:” Further, in some example implementations, attention networks may be trained as part of an end-to-end abstractive summarization system. Further, in some example implementations, the attention model may be trained end-to end on regularly labeled data, instead of training attention parameters and other parameters on different training data.”, and Par. 0028:” FIG. 3 illustrates a conceptual schematic of a hierarchical encoder-decoder architecture 300 in accordance with example implementations of the present application. The architecture may be implemented using one or more neural networks including one or more computing devices such as computing device 805 illustrated in FIG. 8 discussed below. As illustrated, the encoder side 302 of the architecture includes first, a low-level, word-to-word encoder 305 that converts a sequence words in a post, Pj [215-245 from FIG. 2], to a sequence of representations, Hj=<hj0, . . ., hj|pj|>. Subsequently, a top-level, post-to-post encoder 310 converts those representations, <H0, . . ., H|C|> to a sequence of top-level post representations, <m1, . . ., m|C|>. These encoded representations are then passed to the decoder 304, which utilizes a top-level, thread-to-thread, decoder to disentangle them into a sequence of thread representations, <s1, . . ., s|T|>. In some example implementations, the thread-to-thread decoder may be a unidirectional LSTM [fDt2t] with initial state h0Dt2t set with a feedforward-mapped conversation vector C′. Finally, a low-level, word-to-word, decoder takes each thread representation Si and generates a sequence of summary words [265-275 from FIG. 2].”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller, Ramamurthy, and Helmbro in view of Karn to employ wherein the abstractive summarization is performed by a summarization model configured as a multi-level encoding-decoding neural network with attention, in order to identify three different conversational threads represented in the posts, as evidence by Karn (See Par. 0027).

Regarding claim 12, Joller, Ramamurthy, and Helmbro fail to explicitly disclose, however, Karn teaches wherein the summarization model is further configured to summarize the post-processed transcription based on both hierarchical attention at a turn-level and at a word-level. (Karn, Par. 0003:” Interleaved texts are becoming more common with new ways of working and new forms of communication, starting with multi-author entries for activity reports, and later for meeting texts and social media conversations. In these types of multi-participant postings, [e.g., online chat conversations or social media posting boards], several conversations or topic threads may be occurring concurrently.”, and Par. 0023:” Additionally, in some example in some example implementations, the interleaved posts may be encoded hierarchically, [e.g., word-to-word [words in a post] followed by post-to-post [posts in a channel]]. Further, in some example implementations, the decoder may also generates summaries hierarchically, [e.g., thread-to-thread [generate thread representations] followed by word-to-word [e.g., generate summary words]. Additionally, in some example implementations, a hierarchical attention mechanism [discussed in greater detail below with respect to FIG. 5 below] for interleaved [turn] text may be used. As discussed herein, some example implementations of an end-to-end trainable hierarchical framework may enhances performance over a sequence to sequence framework by 8% on a synthetic interleaved texts dataset”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller, Ramamurthy, and Helmbro in view of Karn to employ wherein the summarization model is configured to break the post-processed transcription into a plurality of turns comprising a plurality of words, the turns corresponding to a plurality of speakers, and summarize the post-processed transcription based on hierarchical attention at a turn-level and a word-level , in order to identify three different conversational threads represented in the posts, as evidence by Karn (See Par. 0027).

Regarding claim 13, Joller, Ramamurthy, and Helmbro fail to explicitly disclose, however, Karn teaches wherein each turn is analyzed in context with a determined relationship between one or more of the turns of the plurality of turns. (Karn, Par. 0008:” Further aspects of the present application may include a computing device having a storage means for storing a plurality of posts of interleaved text, means for embedding each post through word-to-word encoding, means for embedding overall content of the plurality of posts through post-to-post encoding based on the word-to-word encoding of each post, means for generating a summary of the at least one thread through word-to-word decoding based on the overall content embedding of the plurality of posts, and means for displaying the summary of the at least one thread to a user.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller, Ramamurthy, and Helmbro in view of Karn to employ wherein each turn is analyzed in context with a determined relationship between one or more of the turns of the plurality of turns, in order to identify three different conversational threads represented in the posts, as evidence by Karn (See Par. 0027).


Allowable Subject Matter
Claims 15 - 17 are objected to as being dependent upon a rejected base claims, but would be allowable if written in independent form including all of the limitations of the base claim and any intervening claims.

Claim 15 recites “The computing system of claim 1, wherein the output comprises the template which is a meeting template that is automatically selected from a plurality of different templates based on a meeting type that is determined from analyzing the post- processed transcript and which is automatically populated with content from the post- processed transcript.” Which is allowable over the prior art. The closest teachings to the indicated allowable subject matter are the references that cited in the current office action. One such prior art of the record is Radner et al. (US Patent No: 9235862B1), where he teaches Col. 12, lines 57 – 64: “FIG. 15 illustrates a screen 1240 which may be reached through the documents tab or other tab or folder such as the board book summary or meeting summary, which provides a tool for quickly creating a minutes template and recording meeting meetings for, for example, a board meeting. For example, the system may be configured to create meeting minutes directly form a board book summary, with the board book agenda and attached files included in the minutes.” Also, Kurstak et al. (US patent application No: 20160065731A1) teaches Par. 0059:” Metadata associated with call information may be, for example, a history of communication with persons in a contact list, a frequency of calls, or a timestamp of a call. For example, if the user B of other electronic device makes a call to the user A of the electronic device 101 so as to fix a time for a meeting almost every day, the electronic device 101 may determine first a topic of a meeting type on the basis of integrated data including statistic metadata when creating summary data about a history of calls with the user B. Once a meeting type template is selected, the electronic device 101 may create words associated with a meeting (i.e., summary data) in a structured form by using a summarizing algorithm of the meeting type template. Then the structured summary data and the template may be stored in the memory 130 or displayed through the display module 150.”). However, none of the prior art of record teach the limitation as stated in the applicant’s claim specifically wherein the output comprises the template which is a meeting template that is automatically selected from a plurality of different templates based on a meeting type that is determined from analyzing the post- processed transcript.
Dependent claims 16, 17 further limit allowable independent claim 15. Therefore, said claims are found allowable over prior art of record by virtue of their dependency. As such claims 14 - 17 are allowable.

Any comments considered necessary by the applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submission should be clearly labeled “Comments on Statement of Reasons for Allowance.”


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. Nelson et al. (US Patent Application No: 20190108834A1) teaches (Par. 0223):” Meeting content may be analyzed at any time relative to an electronic meeting, i.e., before, during or after an electronic meeting, as soon as at least some meeting content is available. Certain types of processing, such as participant analysis described in more detail hereinafter, may benefit from being able to process meeting data from one or more completed electronic meetings. For example, after an electronic meeting ends, meeting intelligence apparatus 102 may analyze stored meeting content data and generate reports based on analyzed meeting content data. Alternatively, meeting intelligence apparatus 102 may analyze meeting content data during an electronic meeting and may generate, after the electronic meeting ends, a report based on analyzed meeting content data. Reports may include any type and amount of data, including any number of documents, a meeting agenda, a meeting summary, a meeting transcript, a meeting participant analysis, a slideshow presentation, etc. As previously described herein, post meeting processing results may be used for other electronic meetings. For example, post meeting processing results may be used to determine suggested meeting participants for other electronic meetings. This may be repeated to improve the quality of suggested meeting participants over time.”
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARIOUSH AGAHI whose telephone number is (408)918-7689. The examiner can normally be reached Monday - Thursday and alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DARIOUSH AGAHI/Examiner, Art Unit 2656                                                                                                                                                                                                        
/MICHELLE M KOETH/Primary Examiner, Art Unit 2656