DETAILED ACTION
This office action is in response to Applicant’s submission filed on 5/29/2020. Claims 1-20 are pending in the application. As such, claims 1- 20 have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement(s)(IDS) submitted on the following dates 5/29/2020, 6/18/2020, 6/30/2021, 9/2/2021, and 1/4/2022 have been considered by the examiner.
Drawings
The drawing filed on 5/29/2020 have been accepted and considered by the examiner.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:


Claim 20 rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. Computer program product as cited in the preamble of claim 20 does not have support or description in the as filed Applicant Specification.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claim 20 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.
Claim 20 drawn to a "program" per se as recited in the preamble ("computer program product" configured to automatically process electronic content as defined in the disclosure) and as such is non-statutory subject matter. See MPEP § 2106.1V.B.1 .a. Data structures not claimed as embodied in computer readable media are descriptive material per se and are not per se held nonstatutory). Such claimed data structures do not define any structural and functional interrelationships between the data structure and other claimed aspects of the invention, which permit the data structure's functionality to be realized. In contrast, a claimed computer readable medium encoded with a data structure defines structural and functional interrelationships between the data structure and the computer software and hardware components which permit the data structure's functionality to be realized, and is thus statutory. Similarly, computer programs claimed as computer listings per se, i.e., the descriptions or expressions of the programs are not physical "things." They are neither computer components nonstatutory processes, as they are not "acts" being performed. Such claimed computer programs do not define any structural and functional interrelationships between the computer program and other claimed elements of a computer, which permit the computer program's functionality to be realized.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 1, 19, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Joller et al. (US20200105274A1)(hereinafter "Joller"), and Helmbro et al. (US20200302112A1)(hereinafter "Helmbro").

Regarding claims 1, 19, and 20, Joller teaches a data processing system, method and computer program product comprising a computing system for automatically processing electronic content and for generating corresponding output, the computing system comprises: (Joller, Par. 0003:” The present disclosure relates generally to systems and methods for processing content. More specifically, but not exclusively, the present disclosure relates to processing audio content to generate shorter-form audio content and/or associated content information and/or to improve content search operations.”).
one or more processors; and one or more computer readable hardware storage devices having stored computer-executable instructions that are executable by the one or more processors to cause the computing system to at least: (Joller, Par. 0113:” In certain embodiments, the content processing system 802, the content generation system 800, and/or the querying system 804 may comprise at least one processor system configured to execute instructions stored on an associated non-transitory computer-readable storage medium. For example, the content processing system 804 may include excitable instructions configured to implement content processing 808 and/or content query processing 810 methods consistent with the disclosed embodiments. The content processing system 802, the content generation system 800, and/or the querying system 804 may further comprise software and/or hardware configured to enable electronic communication of information between the devices and/or systems 800-804 via the network using any suitable communication technology and/or standard.”).
identify electronic content associated with a meeting, the electronic content including audio speech; (Joller, Par. 0007:” Audio content that may be processed, analyzed, and/or structured in connection with aspects of the disclosed embodiments may comprise a variety of types of audio content including, for example and without limitation, one or more of podcasts, radio shows and/or broadcasts, interviews, phone and/or meeting recordings, and/or the like. In addition, although various embodiments are described in connection with processing, audio content, it will be appreciated that aspects of the disclosed systems and methods may be used in connection with a variety of content types and/or formats.”).
create a transcription of the audio speech with an automatic speech recognition (ASR) model trained on speech-to-text training data, the transcription being a text-based transcription; (Joller, Par. 0011:” ... a method of processing audio content may include receiving a first audio content file and generating, based on the first audio content file, a text file comprising transcribed text corresponding to the first audio content file using any suitable audio-to-text transcription process(es). One or more words, multiple word phrases, entities, and/or the like may be extracted from the generated text file, providing parameters for analyzing the content of the file, identifying the most relevant and/or interesting segments, enhancing searchability of the file, and/or the like.", and Par.  0035:" Audio-to-text transcription 110 methods may be used to transcribe the longer-form content 102-108 into associated text. For example, in various embodiments, automatic speech recognition [“ASR”] methods may be used to transcribe audio content into associated text. Diarization 112 may be used to identify and/or otherwise label distinct speakers in the longer-form content 102-108. For example, diarization 112 may label speakers in transcribed audio text with distinct speakers labels independent of a specific name and/or identity. In addition, speaker identification may be used to label distinct speakers with specific identities and/or names.")
perform post-processing on the transcription, generating a post- processed transcription, by using a machine learning model trained on post-processing training data for modifying text included in the transcription, (Joller, Par. 0052:”A set of rules and/or filters may eliminate less informative, less relevant, and/or otherwise noisy candidate tags. In certain embodiments, tags may be filtered using a variety of techniques and/or based on a variety of criteria. For example, in some embodiments, candidate tags may be filtered by identifying whether a candidate tag includes a certain part-of-speech and/or character pattern(s), whether a candidate tag is included in a defined list of uninformative, less relevant, and/or noisy tags [e.g., a black list] and/or a controlled vocabulary, whether a candidate tag is semantically related to surrounding context, whether a candidate tag is used at a threshold or below a threshold level of frequency, and/or the like. Candidate tags included in less relevant content sections and/or segments such as, for example and without limitation, advertisements, non-substantive dialogues, announcements, and/or the like, may also be filtered.” , and Par. 0101:” The punctuated text 710 and/or labeled audio segments 718 may be processed as part of an annotation process 720 to generate annotated text 722. In some embodiments, the annotation process 720 may comprise, for example, annotation of the text with speakers and/or segments corresponding to diarized speech segments. The annotated text 722 may be post-processed 724 and/or otherwise filtered to remove less informative and/or less relevant content such as, for example and without limitation, advertisements, introductions, pauses, conclusions, and/or the like.”, and Par. 0006:"In some embodiments, artificial intelligence methods may be used in connection with content processing, analyzing, and/or structuring including, without limitation, one or more machine learning and/or natural language processing techniques.").
generate output based from the post-processed transcription, the output comprising at least one of a meeting summary generated by a machine learning summarization model that phrases, sentences, and/or segments may be scored 736 [e.g., ranked in order of importance, relevance, and/or interest in the context of the long form audio content 702] and provided to content and/or segment summarization processes 738 to generate one or more cohesive shorter-form summaries 740. Scoring information may be further provided to content search indexing processes 744 as described herein.”, and Par. 0042:” Certain embodiments may employ artificial intelligence methods including, for example, machine learning and/or natural language processing techniques, to enhance the searchability of content, extract metadata [e.g., keywords, key phrases, named entities, topics, and/or the like], and/or the generation of shorter-form content [e.g., segments, summaries, highlights, and/or the like].”).

Joller does not teach wherein the post- processing includes both (1) modifying at least one of a punctuation, grammar or formatting of the transcription that was introduced by the ASR model and (2) changing or omitting one or more words in the transcription which were included in both the audio speech and the transcription.
Helmbro teaches wherein the post- processing includes both (1) modifying at least one of a punctuation, grammar or formatting of the transcription that was introduced by the ASR model and (2) changing or omitting one or more words in the transcription which were included in both the audio speech and the transcription. (Helmbro, 0042:”Optionally at block 92, text corresponding to filler words is omitted. Filler words may be predefined words, such as, for example, “umm”, “ah”, “mmm”, or any other word that is typically used to bridge a gap in time filler words may vary based on language and/or dialect [e.g., filler words for the English language may be different from filler words in the Japanese language]. As filler words are identified in the selected text, they are automatically removed.”, and Par. 0047:” Optionally at block 102, a summarizing algorithm may be applied to the text in order to reduce the number of words in the text container. For example, certain words, such as “a”, “the”, etc. may be omitted in order to reduce the length of the audio clip without altering the message conveyed by the audio clip. Such summarizing algorithms are known in the art and therefore not described herein.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller in view of Helmbro to employ wherein the post- processing includes both (1) modifying at least one of a punctuation, grammar or formatting of the transcription that was introduced by the ASR model and (2) changing or omitting one or more words in the transcription which were included in both the audio speech and the transcription, in order to provide a simple user interface for editing the audio file, as evidence by Helmbro (See Par. 0036).


Claims 2, and 3 are rejected under 35 U.S.C. 103 as being unpatentable over Joller, and Helmbro, as applied to claims 1, and 2 respectively, and in further view of  Crigler et al. (US20070288518A1)(hereinafter "Crigler").


Crigler teaches wherein the transcription includes a plurality of links corresponding to tags associated with the electronic content and wherein the computer-executable instructions are further executable by the one or more processors to cause the computing system to generate the tags from the electronic content. (Crigler, Par. 0101:” Content Enhancement components 132, 134, 136 may be used individually, in sequence, or in parallel to identify the audio track of a newscast and to provide a transcript of the news cast as metadata associated with the content comprising the original newscast. For example, in various implementations, outputs from a plurality of similar Content Enhancement components 132, 134, 136 may be compared to produce metadata that describes the content in the aggregate. An example of such an implementation may include the use of a plurality of speech recognition components from different vendors, with the outputs of each Content Enhancement component processed in relation to the output of other components. The output of the Content Enhancement components 132, 134, 136 may be provided to MMS 114 for association with original content in order to improve the final results stored in MMS 114. [0149] The static page contains a preview image, title, description, first couple of paragraphs of the transcript or STT, a set of keyword metadata encoded as meta tags, and a list of links to related pieces of content. The static pages may also have links other static pages that allow the search engines to follow the links in order tags for each piece of content.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller, and Helmbro in view of Crigler to employ wherein the transcription includes a plurality of links corresponding to tags associated with the electronic content and wherein the computer-executable instructions are further executable by the one or more processors to cause the computing system to generate the tags from the electronic content, in order to improve search engine ranking, as evidence by Crigler (see Par. 0153).

Regarding claim 3, Joller teaches the computing system of claim 2, wherein the plurality of links point to data related to the electronic content, but wherein the data related to the electronic content is external to the electronic content. (Joller, Par. 0056:” In various embodiments, entities, keywords, and/or key phrases may be mapped/linked to unique identities in one or more knowledge bases and/or other available information repositories. In some embodiments, mapping keywords, key phrases, and/or entities to identities may disambiguate their meaning and/or otherwise enrich them with relationships, properties, metadata, and/or other information. In certain embodiments, keyword, key phrase, and/or entity linking may use context and/or other features from the content where entities, keywords, and/or key phrases occur and compare them to the context and/or other features from external sources where the knowledge base identities occur. Linking determinations may, in some implementations, use machine learning models and/or algorithms.”).

Claim 4, is rejected under 35 U.S.C. 103 as being unpatentable over Joller, Helmbro, and Crigler as applied to claim 2, and in further view of  Georges et al. (US20190266240A1)(hereinafter "Georges").

Regarding claim 4, Joller, Helmbro, and Crigler do not teach the computing system of claim 2, wherein the computing system uses a machine learning speech tagging model to generate the tags, the machine learning speech tagging model generating at least one tag in response to identifying a spoken starting keyword and a spoken ending keyword in the audio speech, and wherein the generating of the at least one tag includes classifying content of the audio speech as a particular note type, selected from a plurality of note types, based on the content which occurs between the starting keyword and the ending.
Georges teaches wherein the computing system uses a machine learning speech tagging model to generate the tags, the machine learning speech tagging model generating at least one tag in response to identifying a spoken starting keyword and a spoken ending keyword in the audio speech, and (Georges, Par. 0020:” With reference to FIG. 1, an embodiment of an electronic system 10 may include memory 12 to store an electronic representation of an audio stream, a processor 11 coupled to the memory 12, and logic 13 coupled to the processor 11 and the memory 12. The logic 13 may be configured to detect a phrase in the audio stream based on a pre-defined vocabulary, associate a time stamp with the detected phrase, and classify a spoken intent based on a sequence of detected phrases and the respective associated time stamps. For example, the logic 13 may be further configured to monitor a continuous audio 
wherein the generating of the at least one tag includes classifying content of the audio speech as a particular note type, selected from a plurality of note types, based on the content which occurs between the starting keyword and the ending (Georges, Par. 0021:” In some embodiments, the logic 13 may additionally, or alternatively, include a second neural network trained to return a probability for each of two or more intent classifications based on detected phrases and time stamps as input features to the second neural network. For example, the logic 13 may be further configured to classify the spoken intent in accordance with a highest probability of the two or more intent classifications. In some embodiments, the logic 13 may be configured to asynchronously trigger the second neural network when a sequence of detected phrases is ready for classification.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller, Helmbro, and Crigler in view of Georges to employ wherein the computing system uses a machine learning speech tagging model to generate the tags, the machine learning speech tagging model generating at least one tag in response to identifying a spoken starting keyword and a spoken ending keyword in the audio speech, and wherein the generating of the at least one tag includes .


Claims 5, and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Joller, Helmbro, Crigler, and Georges as applied to claim 4, and in further view of  De et al. (US8688447B1)(hereinafter "De").

Regarding claim 5, Joller, Helmbro, Crigler and Georges do not teach the computing system of claim 4, wherein the at least one tag comprises an action item note type that identifies one or more tasks and one or more entities associated with the one or more tasks.
De teaches wherein the at least one tag comprises an action item note type that identifies one or more tasks and one or more entities associated with the one or more tasks. (De, Col. 5, lines 36 – 43:” As shown in FIG. 2D, in one embodiment of the invention, each action [214] is associated with one or more transcription-level tags [206]. Actions may be performed using entities [200] that are related to transcription-level tags. Following the above example, the transcription-level tag BOOK-HOTEL may be associated with actions such as booking a reservation with a hotel using entities tagged by HOTEL-NAME and CITY.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller, Helmbro, Crigler, and Georges in view of De to employ wherein the at least one tag comprises an action item note 

Regarding claim 6, Joller, Helmbro, Georges and De do not teach the computing system of claim 5, wherein wherein the at least one tag further includes links to one or more of an assigning party, a responsible party, a deadline, a content, or a priority level associated with the task.  
Crigler teaches wherein the at least one tag further includes links to one or more of an assigning party, a responsible party, a deadline, a content, or a priority level associated with the task. (Crigler, Par. 0149:” The static page contains a preview image, title, description, first couple of paragraphs of the transcript or STT, a set of keyword metadata encoded as meta tags, and a list of links to related pieces of content. The static pages may also have links other static pages that allow the search engines to follow the links in order to crawl all of our static pages. The MMS 114 provides the pieces of content, associated content, metadata, and tags for each piece of content.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller, Helmbro, Georges and De in view of Crigler to employ wherein the at least one tag further includes links to one or more of an assigning party, a responsible party, a deadline, a content, or a priority level associated with the task, in order to improve search engine ranking, as evidence by Crigler (see Par. 0153).

Claims 7, 8, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Joller, and Helmbro, as applied to claims 1, 7, and 1 respectively, and in further view of  Chen et al. (US20200394258A1)(hereinafter "Chen").

Regarding claim 7, Joller and Helmbro do not teach the computing system of claim 1, wherein the readability of the transcription is modified when generating the post-processed transcription by converting a spoken language style of the audio speech to a written language style.  
Chen teaches wherein the readability of the transcription is modified when generating the post-processed transcription by converting a spoken language style of the audio speech to a written language style. (Chen, 0020:” In the SR-NLU system presently disclosed herein, a transcription is generated from an ASR subsystem. The transcription may then be processed to refine or “edit” the transcription to replace certain tokens within the transcription. The replacements may serve to, for example, remove vulgar words or expressions from a transcription, correct formatting of numbers or other terms, and correct titles and names of people or places. As such, references to an “edited” transcription are intended refer to a transcription that has been modified by replacing certain words or tokens in the transcription to create a more refined or improved transcription.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller, and Helmbro in view of Chen to employ wherein the readability of the transcription is modified when generating the post-processed transcription by converting a spoken language style of the audio speech to a 


Regarding claim 8, Joller and Helmbro do not teach the computing system of claim 7, wherein the readability of the transcription is modified by determining a level of readability of individual words and phrases of the transcription and at least (1) removing words corresponding to a low level of readability, or (2) substituting words corresponding to a low level of readability with words corresponding to an increased level of readability, wherein the determining the level of readability is based on the individual words and phrases contributing to a semantic meaning and/or desired style inferred from the transcription.  
Chen teaches wherein the readability of the transcription is modified by determining a level of readability of individual words and phrases of the transcription and at least (1) removing words corresponding to a low level of readability, or (2) substituting words corresponding to a low level of readability with words corresponding to an increased level of readability, (Chen, 0020:” In the SR-NLU system presently disclosed herein, a transcription is generated from an ASR subsystem. The transcription may then be processed to refine or “edit” the transcription to replace certain tokens within the transcription. The replacements may serve to, for example, remove vulgar words or expressions from a transcription, correct formatting of numbers or other terms, and correct titles and names of people or places. As such, references to an “edited” transcription are intended refer to a transcription that has modified by replacing certain words or tokens in the transcription to create a more refined or improved transcription.”).
wherein the determining the level of readability is based on the individual words and phrases contributing to a semantic meaning and/or desired style inferred from the transcription.   (Chen Par. 0077:” In another implementation, the replacement token is an abbreviation of the token of interest; the token of interest is a textual expression of a number and the replacement token is the number; the token of interest has a vulgar meaning and the replacement token is a polite synonym of the token of interest; the token of interest is a loan word and the replacement token is a synonym native to a language of the speech audio; or the replacement token has a same pronunciation as the token of interest and a more proper written form than the token of interest in the identified natural language domain.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller, and Helmbro in view of Chen to employ wherein the readability of the transcription is modified by determining a level of readability of individual words and phrases of the transcription and at least (1) removing words corresponding to a low level of readability, or (2) substituting words corresponding to a low level of readability with words corresponding to an increased level of readability, wherein the determining the level of readability is based on the individual words and phrases contributing to a semantic meaning and/or desired style inferred from the transcription, in order to provide a good user experience, as evidence by Chen (see Par. 0003).



Chen teaches wherein the output generated from the post-processed transcript is further post-processed remove errors and (Chen, 0020:” In the SR-NLU system presently disclosed herein, a transcription is generated from an ASR subsystem. The transcription may then be processed to refine or “edit” the transcription to replace certain tokens within the transcription. The replacements may serve to, for example, remove vulgar words or expressions from a transcription, correct formatting of numbers or other terms, and correct titles and names of people or places. As such, references to an “edited” transcription are intended refer to a transcription that has been modified by replacing certain words or tokens in the transcription to create a more refined or improved transcription.”).
modify text to improve the readability and accuracy of the output.  (Chen, Par. 0026:” As another example, when a user says “when is the pink concert”, the SR-NLU system should understand that the user is asking when is the concert of the singer “P!nk”, so a more appropriate transcription to be rendered to the user should be “when is the P!nk concert” instead of “when is the pink concert”. This means that the “pink” is to be replaced by the “P!nk” which is a more proper written form of the singer's name and has the same pronunciation as that of the “pink”. Likewise, the replacement from the “pink” to the “P!nk” is specific to the Music domain. It is obviously not proper to implement such a replacement in other domains such as a Geography domain to answer the query “show me a picture of the pink poodle motel”.”).
.

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Joller, and Helmbro, as applied to claim 1, and in further view of  Chen et al. (US20210350786A1)(hereinafter "Chen2").

Regarding claim 9, Joller and Helmbro do not teach the computing system of claim 1, wherein the post-processing training data is created by: identifying ungrammatical sentences comprising text; generating text-to-speech (TTS) data from the text; transcribing the TTS data using an automatic speech recognition model; and pairing the transcribed TTS data with the corresponding ungrammatical sentences.
Chen2 teaches wherein the post-processing training data is created by: identifying ungrammatical sentences comprising text; (Chen2, Par. 0003:” The method includes obtaining, by data processing hardware, a plurality of training text utterances, wherein a first portion of the plurality of training text utterances includes unspoken text utterances and a remaining second portion of the plurality of training text utterances comprises transcriptions in a set of spoken training utterances [ungrammatical]. Each unspoken text utterance is not paired with any corresponding spoken utterance.”).
spoken training utterance comprising a corresponding transcription paired with a corresponding non-synthetic speech representation of the corresponding spoken training utterance. For each of a plurality of output steps for each training text utterance of the plurality of training text utterances, the method also includes: generating, by the data processing hardware, for output by the GAN-based TTS model, a synthetic speech representation of the corresponding training text utterance, and determining, by the data processing hardware, using an adversarial discriminator of the GAN, an adversarial loss term indicative of an amount of acoustic noise disparity in one of the non-synthetic speech representations selected from the set of spoken training utterances relative to the corresponding synthetic speech representation of the corresponding training text utterance.”).
transcribing the TTS data using an automatic speech recognition model; and (Chen2, Par. 0003:” The method also includes training, by the data processing hardware, the speech recognition model on the synthetic speech representation generated at each of the plurality of output steps for each training text utterance of the plurality of training text utterances.”).
pairing the transcribed TTS data with the corresponding ungrammatical sentences. (Chen2, Par. 0007:” In some examples, at each of the plurality of output steps for each training text utterance of the plurality of training text utterances, the one of the non-synthetic speech representations selected from the set of spoken training utterances includes: a randomly selected non-synthetic speech representation from the set of spoken training utterances when the corresponding training text utterance comprises one of the unspoken text utterances in the first portion of the plurality of training text utterances; or a non-synthetic speech representation  from the set of spoken training utterances that is paired with the corresponding one of the transcriptions when the corresponding training text utterance comprises one of the transcriptions in the second portion of the plurality of training text utterances.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller, and Helmbro in view of Chen2 to employ wherein the post-processing training data is created by: identifying ungrammatical sentences comprising text; generating text-to-speech (TTS) data from the text; transcribing the TTS data using an automatic speech recognition model; and pairing the transcribed TTS data with the corresponding ungrammatical sentences, in order to improve the accuracy of the ASR model, as evidence by Chen2 (See Par. 0030).


Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Joller, and Helmbro, as applied to claim 1, and in further view of  John Niekrasz (US20190327103A1)(hereinafter " Niekrasz").

Regarding claim 10, Joller and Helmbro do not teach the computing system of claim 1, wherein the output comprises the meeting summary which is automatically generated based on abstractive summarization of the post-processed transcription.  
Niekrasz teaches wherein the output comprises the meeting summary which is automatically generated based on abstractive summarization of the post-processed automatically generating abstractive summaries of meetings, the computing system comprising: a memory configured to store a transcript of a meeting”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller, and Helmbro in view of Niekrasz to employ wherein the output comprises the meeting summary which is automatically generated based on abstractive summarization of the post-processed transcription, in order to provide technical improvements to voice recognition and dictation-related technologies that provide at least one practical application, as evidence by Niekrasz (See Par. 0005).

Claims 11, 12, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Joller, and Helmbro, as applied to claim 10, 11, and 12 respectively, and in further view of  Karn et al. (US20200285663A1)(hereinafter " Karn").

Regarding claim 11, Joller and Helmbro do not teach the computing system of claim 10, wherein the abstractive summarization is performed by a summarization model configured as a multi-level encoding-decoding neural network with attention.  
Karn teaches wherein the abstractive summarization is performed by a summarization model configured as a multi-level encoding-decoding neural network with attention. (Karn, Par. 0026:” Further, in some example implementations, attention networks may be trained as part of an end-to-end abstractive summarization system. Further, in some example implementations, the attention model may be trained end-to end on regularly labeled data, encoder-decoder architecture 300 in accordance with example implementations of the present application. The architecture may be implemented using one or more neural networks including one or more computing devices such as computing device 805 illustrated in FIG. 8 discussed below. As illustrated, the encoder side 302 of the architecture includes first, a low-level, word-to-word encoder 305 that converts a sequence words in a post, Pj [215-245 from FIG. 2], to a sequence of representations, Hj=<hj0, . . . , hj|pj|>. Subsequently, a top-level, post-to-post encoder 310 converts those representations, <H0, . . . , H|C|> to a sequence of top-level post representations, <m1, . . . , m|C|>. These encoded representations are then passed to the decoder 304, which utilizes a top-level, thread-to-thread, decoder to disentangle them into a sequence of thread representations, <s1, . . . , s|T|>. In some example implementations, the thread-to-thread decoder may be a unidirectional LSTM [fDt2t] with initial state h0Dt2t set with a feedforward-mapped conversation vector C′. Finally, a low-level, word-to-word, decoder takes each thread representation Si and generates a sequence of summary words [265-275 from FIG. 2].”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller, and Helmbro in view of Karn to employ wherein the abstractive summarization is performed by a summarization model configured as a multi-level encoding-decoding neural network with attention, in order to identify three different conversational threads represented in the posts, as evidence by Karn (See Par. 0027).

Regarding claim 12, Joller and Helmbro do not teach the computing system of claim 11, wherein the summarization model is configured to break the post-processed transcription into a plurality of turns comprising a plurality of words, the turns corresponding to a plurality of speakers, and summarize the post-processed transcription based on hierarchical attention at a turn-level and a word-level.
Karn teaches wherein the summarization model is configured to break the post-processed transcription into a plurality of turns comprising a plurality of words, the turns corresponding to a plurality of speakers, and summarize the post-processed transcription based on hierarchical attention at a turn-level and a word-level. (Karn, Par. 0003:” Interleaved texts are becoming more common with new ways of working and new forms of communication, starting with multi-author entries for activity reports, and later for meeting texts and social media conversations. In these types of multi-participant postings, [e.g., online chat conversations or social media posting boards], several conversations or topic threads may be occurring concurrently.”, and Par. 0023:” Additionally, in some example in some example implementations, the interleaved posts may be encoded hierarchically, [e.g., word-to-word [words in a post] followed by post-to-post [posts in a channel]]. Further, in some example implementations, the decoder may also generates summaries hierarchically, [e.g., thread-to-thread [generate thread representations] followed by word-to-word [e.g., generate summary words]. Additionally, in some example implementations, a hierarchical attention mechanism [discussed in greater detail below with respect to FIG. 5 below] for interleaved [turn] text may be used. As discussed herein, some example implementations of an end-to-end trainable hierarchical framework may enhances performance over a sequence to sequence framework by 8% on a synthetic interleaved texts dataset”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Joller, and Helmbro in view of Karn to employ wherein the summarization model is configured to break the post-processed transcription into a plurality of turns comprising a plurality of words, the turns corresponding to a plurality of speakers, and summarize the post-processed transcription based on hierarchical attention at a turn-level and a word-level , in order to identify three different conversational threads represented in the posts, as evidence by Karn (See Par. 0027).

Regarding claim 13, Joller and Helmbro do not teach the computing system of claim 12, wherein each turn is analyzed in context with a determined relationship between one or more of the turns of the plurality of turns.  
Karn teaches wherein each turn is analyzed in context with a determined relationship between one or more of the turns of the plurality of turns. (Karn, Par. 0008:” Further aspects of the present application may include a computing device having a storage means for storing a plurality of posts of interleaved text, means for embedding each post through word-to-word encoding, means for embedding overall content of the plurality of posts through post-to-post encoding based on the word-to-word encoding of each post, means for generating a summary of the at least one thread through word-to-word decoding based on the overall content embedding of the plurality of posts, and means for displaying the summary of the at least one thread to a user.”).
.


Allowable Subject Matter
Claims 14 - 17 are objected to as being dependent upon a rejected base claims, but would be allowable if written in independent form including all of the limitations of the base claim and any intervening claims.

Claim 14 recites “The computing system of claim 13, wherein the summarization model analyzes each turn based on a role vector of a speaker corresponding to the turn, the speaker being one of the plurality of speakers, and wherein the summarization model selectively applies one or more different rules during the analyzing, each of the one or more different rules being based on one or more correspondingly different roles from which the role vector is determined.” Which is allowable over the prior art. The closest teachings to the indicated allowable subject matter are the references that cited in the current office action. One such prior art of the record is Vaquero Aviles-Casco et al. (US Patent Application No: 20200279568A1), where he teaches Par. 0183:” The processes of clustering and speaker tracking, described above, amount to a preliminary form of diarisation, i.e. partitioning the 

Claim 15 recites “The computing system of claim 1, wherein the output comprises the template which is a meeting template that is automatically selected from a plurality of different templates based on a meeting type that is determined from analyzing the post- processed transcript and which is automatically populated with content from the post- processed transcript.” Which is allowable over the prior art. The closest teachings to the indicated allowable subject matter are the references that cited in the current office action. One such prior art of the record is Radner et al. (US Patent No: 9235862B1), where he teaches Col. 12, lines 57 – 64: “FIG. 15 illustrates a screen 1240 which may be reached through the documents tab or other tab or folder such as the board book summary or meeting summary, which provides a tool for quickly creating a minutes template and recording meeting meetings for, for 
Dependent claims 16, 17 further limit allowable independent claim 15. Therefore said claims are found allowable over prior art of record by virtue of their dependency. As such claims 14 - 17 are allowable.





Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. Nelson et al. (US Patent Application No: 20190108834A1) teaches (Par. 0223):”Meeting content may be analyzed at any time relative to an electronic meeting, i.e., before, during or after an electronic meeting, as soon as at least some meeting content is available. Certain types of processing, such as participant analysis described in more detail hereinafter, may benefit from being able to process meeting data from one or more completed electronic meetings. For example, after an electronic meeting ends, meeting intelligence apparatus 102 may analyze stored meeting content data and generate reports based on analyzed meeting content data. Alternatively, meeting intelligence apparatus 102 may analyze meeting content data during an electronic meeting and may generate, after the electronic meeting ends, a report based on analyzed meeting content data. Reports may include any type and amount of data, including any number of documents, a meeting agenda, a meeting summary, a meeting transcript, a meeting participant analysis, a slideshow presentation, etc. As previously described herein, post meeting processing results may be used for other electronic 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARIOUSH AGAHI whose telephone number is (408)918-7689. The examiner can normally be reached Monday - Thursday and alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DARIOUSH AGAHI/             Examiner, Art Unit 2656                                                                                                                                                                                           
/HUYEN X VO/             Primary Examiner, Art Unit 2656