DETAILED ACTION
	This action is in response to Applicant’s Request for Continued Examination ("Response”) received on December 23, 2021 in response to the Office Action dated August 25, 2021. This action is made Non-Final.
Claims 1-12 and 14-19 are pending in the case. 
Claims 1, 8, and 16 are independent claims.
Claims 1-12 and 14-19 are rejected.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Applicant’s Response
	In Applicant’s Response, Applicant amended claims 8 and 16, cancelled claims 13 and 20, and submitted arguments against the prior art in the Office Action dated August 25, 2021.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3 and 5-7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Snover, et al., "A lexically-driven algorithm for disfluency detection," Proceedings of HLT-NAACL 2004: Short Papers, 2004, pp. 1-4, and further in view of Kalia et al., US Patent Application Publication no. US 2020/0074002 (“Kalia”).
Claim 1:
	Snover teaches or suggests a non-transitory computer-readable medium with instructions stored thereon that, when executed by a processor of a computing device, cause the computing device to perform operations:
	tokenizing text data to produce a set of tokens, wherein each token is representative of a corresponding word in the text data (see § 3, p2 - features of each of the words including the lexeme (the word itself), a POS tag for the word; § 4.2, p3 - the number of disfluent tokens in the reference transcript.);
	performing grammatical tagging so that each token in the set of tokens is labeled as corresponding to a part of speech (see §1, p1 - disfluencies can be detected using primarily lexical features—specifically the words themselves and part-of-speech (POS) labels; § 3, p2 - features of each of the words including the lexeme (the word itself), a POS tag for the word; § 4.2, p3 - the number of disfluent tokens in the reference transcript.);
	applying a rule associated with a filler word that is representative of a non-lexical utterance to the labeled set of tokens (see Abstract - disfluency detection in speech transcripts using primarily lexical features; 1, p1 - disfluencies can be detected using primarily lexical features—specifically the words themselves and part-of-speech (POS) labels; §2, p2 - disfluency detection is to distinguish fluent from disfluent words. fillers—words with no meaning that are used as discourse markers and pauses, such as “you know” and “um”; §3, p2 - identify which words in the transcript are fillers, edits, or fluent.  rules to relabel words as edits or fillers; § 3, p2 - output of the system is an ordered set of rules, which can then be applied to the test data to annotate it for disfluencies; § 3.2, p2 - representative subset of rule templates chosen by the system. Change the label of: rules 1-10; §4, p2 - whether the word is commonly used as a filler, edit, back-channel word, or is part of a short repeat. three labels (filler, edit and fluent). Learn many subtypes of disfluencies; § 4.2, p3 - 10 rules learned are: rules 1-10. rules were responsible for correcting.);
	wherein the rule is representative of a data structure that specifies (i) the filler word and (ii) a contextual parameter indicative of criterion that must be satisfied for the rule to indicate that a given labeled token represents an instance of the filler word (see Abstract - disfluency detection in speech transcripts using primarily lexical features; 1, p1 - disfluencies can be detected using primarily lexical features—specifically the words themselves and part-of-speech (POS) labels; §2, p2 - disfluency detection is to distinguish fluent from disfluent words. fillers—words with no meaning that are used as discourse markers and pauses, such as “you know” and “um”; §3, p2 - identify which words in the transcript are fillers, edits, or fluent.  rules to relabel words as edits or fillers. set of possible rules is found by expanding rule templates. output of the system is an ordered set of rules, which can then be applied to the test data to annotate it for disfluencies; § 3.2, p2 - representative subset of rule templates chosen by the system. Change the label of: rules 1-10; §4, p2 - whether the word is commonly used as a filler, edit, back-channel word, or is part of a short repeat. three labels (filler, edit and fluent). Learn many subtypes of disfluencies; § 4.2, p3 - 10 rules learned are: rules 1-10. rules were responsible for correcting.).
	Snover appears to fail to explicitly disclose arranged in sequential order.
	Kalia teaches or suggests arranged in a sequential order (see para. 0034 - facilitate constructing a knowledge graph (e.g., a domain-specific knowledge graph) that can be based on unstructured data of a description. tokenize (parse) one or more sections. tokenize can refer to a process of classifying and/or demarcating sections of a sequence of characters (e.g., words of a sentence). In some embodiments, ontology component 108 can tokenize one or more sentences of a description of an API using tokens represented in Extensible Markup Language (XML). In some embodiments, ontology component 108 can tokenize one or more sentences of a description of an API using tokens represented as symbolic expressions (s-expression), where such s-expressions can comprise nested list (tree-structured) data.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Snover, to include arranged in a sequential order for the purpose of efficiently generating a classification driven system to improve actions based on rules, as taught by Kalia (0031 and 0034).

Claim 2:
	Snover fails to explicitly disclose wherein the set of tokens is represented in Extensible Markup Language (XML).
	Kalia teaches or suggests wherein the set of tokens is represented in Extensible Markup Language (XML) (see para. 0034 - facilitate constructing a knowledge graph (e.g., a domain-specific knowledge graph) that can be based on unstructured data of a description. tokenize (parse) one or more sections. tokenize can refer to a process of classifying and/or demarcating sections of a sequence of characters (e.g., words of a sentence). In some embodiments, ontology component 108 can tokenize one or more sentences of a description of an API using tokens represented in Extensible Markup Language (XML). In some embodiments, ontology component 108 can tokenize one or more sentences of a description of an API using tokens represented as symbolic expressions (s-expression), where such s-expressions can comprise nested list (tree-structured) data.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Snover, to include wherein the set of tokens is represented in Extensible Markup Language (XML) for the purpose of efficiently generating a classification driven system to improve actions based on rules, as taught by Kalia (0031 and 0034).

Claim 3:
	Snover fails to explicitly disclose wherein the set of tokens is represented as an s-expression.
	Kalia teaches or suggests wherein the set of tokens is represented as an s-expression Markup Language (XML) (see para. 0034 - facilitate constructing a knowledge graph (e.g., a domain-specific knowledge graph) that can be based on unstructured data of a description. tokenize (parse) one or more sections. tokenize can refer to a process of classifying and/or demarcating sections of a sequence of characters (e.g., words of a sentence). In some embodiments, ontology component 108 can tokenize one or more sentences of a description of an API using tokens represented in Extensible Markup Language (XML). In some embodiments, ontology component 108 can tokenize one or more sentences of a description of an API using tokens represented as symbolic expressions (s-expression), where such s-expressions can comprise nested list (tree-structured) data.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Snover, to include wherein the set of tokens is represented as an s-expression for the purpose of efficiently generating a classification driven system to improve actions based on rules, as taught by Kalia (0031 and 0034).

Claim 5:
	Snover further teaches or suggests wherein each token is representative of a tuple that comprises the corresponding word and the corresponding part of speech (see § 3, p2 - features of each of the words including the lexeme (the word itself), a POS tag for the word; § 4.2, p3 - the number of disfluent tokens in the reference transcript.).

Claim 6:
	Snover further teaches or suggests wherein the text data is representative of a transcript (see Abstract - disfluency detection in speech transcripts using primarily lexical features; § 3, p2 – task is to identify which words in the transcript are fillers, edits,
or fluent; § 4.2, p3 - the number of disfluent tokens in the reference transcript.).

Claim 7:
Snover further teaches or suggests wherein the operations further comprise: identifying, based on an outcome of said applying, a word in the text data that is representative of an instance of the filler word (see Abstract - disfluency detection in speech transcripts using primarily lexical features; 1, p1 - disfluencies can be detected using primarily lexical features—specifically the words themselves and part-of-speech (POS) labels; §2, p2 - disfluency detection is to distinguish fluent from disfluent words. fillers—words with no meaning that are used as discourse markers and pauses, such as “you know” and “um”; §3, p2 - identify which words in the transcript are fillers, edits, or fluent. rules to relabel words as edits or fillers. set of possible rules is found by expanding rule templates. output of the system is an ordered set of rules, which can then be applied to the test data to annotate it for disfluencies; § 3.2, p2 - representative subset of rule templates chosen by the system. Change the label of: rules 1-10; §4, p2 - whether the word is commonly used as a filler, edit, back-channel word, or is part of a short repeat. three labels (filler, edit and fluent). Learn many subtypes of disfluencies; § 4.2, p3 - 10 rules learned are: rules 1-10. rules were responsible for correcting.).




Claim 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Snover, in view of Kalia, and further in view of Yao, US Patent Application Publication no. US 2011/0276577 (“Yao”).
Claim 4:
Yao further teaches or suggests performing language identification based on classifiers that use short-character subsequences as features; and determining, based on an outcome of said performing, that the text data is in a default language for which tokenization is possible (see para. 0008 – language identifier for receiving the document and identifying the language of the document as one of the base language or the second language; para. 0020 - plurality of keyword sets comprising a base language keyword set.  second language keyword set comprising a plurality of second language keywords; plurality of tokenizers, each tokenizer associated with a language and a respective keyword set of the plurality of keyword sets; para. 0036 - portion of the document, that is in the language associated with the tokenizer; para. 0037 - documents 214 are processed by a language identifier 224 in order to identify a language of the document 215. A tokenizer selector 226 receives the document and the indication of the language of the document, selects the tokenizer 220a, 220b, 220c for processing the document 214, which processes the document to produce a feature vector 208 that can be used by the profiling. language identifier 224 may determine a language of the entire document or portions thereof. The appropriate tokenizer may be selected for processing the entire document or portion thereof.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Snover, to include performing language identification based on classifiers that use short-character subsequences as features; and determining, based on an outcome of said performing, that the text data is in a default language for which tokenization is possible for the purpose of efficiently selecting an appropriate tokenizer, thus improving the effectiveness of tokenization, as taught by Yao (0037 and 0056).

Claims 8-10, 12, 14, and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Snover, in view of Chotimongkol et al., US Patent Application Publication no. US 2016/0086601 (“Chotimongkol”), and further in view of Scholz et al., US Patent Application Publication no. US 2020/0111386 (“Scholz”).
Claim 8:
	Snover teaches or suggests a non-transitory computer readable medium with instructions stored thereon that, when executed by a processor of a computing device to perform operations comprising:
	tokenizing text data to produce a set of tokens, wherein each token is representative of a corresponding word in the text data (see §3, p2 - features of each of the words including the lexeme (the word itself), a POS tag for the word; § 4.2, p3 - the number of disfluent tokens in the reference transcript.);
	applying a rule associated with a filler word that is representative of a non-lexical utterance to the labeled set of tokens (see Abstract - disfluency detection in speech transcripts using primarily lexical features; 1, p1 - disfluencies can be detected using primarily lexical features—specifically the words themselves and part-of-speech (POS) labels; §2, p2 - disfluency detection is to distinguish fluent from disfluent words. fillers—words with no meaning that are used as discourse markers and pauses, such as “you know” and “um”; §3, p2 - identify which words in the transcript are fillers, edits, or fluent.  rules to relabel words as edits or fillers; § 3, p2 - output of the system is an ordered set of rules, which can then be applied to the test data to annotate it for disfluencies; § 3.2, p2 - representative subset of rule templates chosen by the system. Change the label of: rules 1-10; §4, p2 - whether the word is commonly used as a filler, edit, back-channel word, or is part of a short repeat. three labels (filler, edit and fluent). Learn many subtypes of disfluencies; § 4.2, p3 - 10 rules learned are: rules 1-10. rules were responsible for correcting.);
	wherein the rule is representative of a data structure that specifies (i) the filler word and (ii) a contextual parameter indicative of criterion that must be satisfied for the rule to indicate that a given labeled token represents an instance of the filler word (see Abstract - disfluency detection in speech transcripts using primarily lexical features; 1, p1 - disfluencies can be detected using primarily lexical features—specifically the words themselves and part-of-speech (POS) labels; §2, p2 - disfluency detection is to distinguish fluent from disfluent words. fillers—words with no meaning that are used as discourse markers and pauses, such as “you know” and “um”; §3, p2 - identify which words in the transcript are fillers, edits, or fluent.  rules to relabel words as edits or fillers. set of possible rules is found by expanding rule templates. output of the system is an ordered set of rules, which can then be applied to the test data to annotate it for disfluencies; § 3.2, p2 - representative subset of rule templates chosen by the system. Change the label of: rules 1-10; §4, p2 - whether the word is commonly used as a filler, edit, back-channel word, or is part of a short repeat. three labels (filler, edit and fluent). Learn many subtypes of disfluencies; § 4.2, p3 - 10 rules learned are: rules 1-10. rules were responsible for correcting.).
	Snover appears to fail to explicitly disclose arranged in sequential order; performing dependency parsing so that for each sentence included in the text data, a dependency parse is extracted and a grammatical structure is defined; for each token in the set of tokens, labeling the token based on the dependency parse and the grammatical structure for the sentence with which the token is associated; and causing display of the text data on an interface in such a manner that each word that represents an instance of the filler word is visually distinguishable form other words in the text data.
	Chotimongkol teaches or suggests arranged in sequential order; performing dependency parsing so that for each sentence included in the text data, a dependency parse is extracted and a grammatical structure is defined; for each token in the set of tokens, labeling the token based on the dependency parse and the grammatical structure for the sentence with which the token is associated (see Fig. 3-8 – para. 0013 - generating a semantic and syntactic graph associated with the received utterance, extracting all n-grams as features from the generated semantic and syntactic graph and classifying the utterance; para. 0038 - formed by adding transitions encoding semantic and syntactic categories of words or word sequences to the word graph. The first additional piece of information is the part of speech tags of the words; para. 0039 - other type of information that is encoded in these graphs is the syntactic parse of each utterance, namely the syntactic phrases with their head words. labels of the transitions for syntactic phrases are prefixed by the token "PHRASE:"; para. 0042 - semantic and syntactic information utilized in this example may comprise part of speech tags, syntactic parses, named entity tags, and semantic role labels- it is anticipated that insertion of further information such as supertags or word stems can also be beneficial for further processing using semantic and syntactic graphs; para. 0046 - The head words of the syntactic phrases and predicate of the arguments are included in the SSGs. This enables the classifier to handle long distance dependences better than using other simpler methods; para. 0062 - features include token-level features (such as the current (head) word, its part of speech tag, base phrase type and position, etc.). 
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Snover, to include arranged in sequential order; performing dependency parsing so that for each sentence included in the text data, a dependency parse is extracted and a grammatical structure is defined; for each token in the set of tokens, labeling the token based on the dependency parse and the grammatical structure for the sentence with which the token is associated for the purpose of efficiently incorporating lexical, semantic, and syntactic information to improve understanding of spoken language, as taught by Chotimongkol (0012).
Scholz further teaches or suggests causing display of the text data on an interface in such a manner that each word that represents an instance of the filler word is visually distinguishable form other words in the text data (see Fig. 8; para. 0105 - scoring module (60) can further function to identify and highlight (132) filler words (55) in the formatted text (98). The highlight (132) can be depicted by under lineation of the filler words (55); however, this example does not preclude any manner of visually viewable highlight of filler words (55), such as shading, colored shading, encircling, dots, bold lines, or the like. use of different highlight (132) between unclear words (130) and filler words (55); para. 0106 - associate a trigger area (133), as described in working example 2 with each filler word (55).  When the user (26) moves the pointer (28) over the trigger area (133) associated with the filler words (55) in the formatted text (98) the presentation scoring module (60) further operates to depict the filler score image (135). In the instant example, the filler score image (135) indicates that that "this word was identified as a filler word.").
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Snover, to include causing display of the text data on an interface in such a manner that each word that represents an instance of the filler word is visually distinguishable form other words in the text data for the purpose of efficiently emphasizing certain types of words in text to improve a users ability to identify and interact, as taught by Scholz (0105 and 0106).

Claim 9:
	Snover further teaches or suggests wherein the text data is representative of a transcript (see Abstract - disfluency detection in speech transcripts using primarily lexical features; § 3, p2 – task is to identify which words in the transcript are fillers, edits,
or fluent; § 4.2, p3 - the number of disfluent tokens in the reference transcript.).

Claim 10:
	Snover further teaches or suggests wherein the operations further comprise: generating the transcript by performing a speech-to-text (STT) operation on an audio file (see Abstract - disfluency detection in speech transcripts using primarily lexical features; § - provide a rich transcription of speech recognition output, including speaker identification, sentence boundary detection and the annotation of disfluencies in the transcript. production of an annotation specification for disfluencies in speech transcripts and the transcription of sizable amounts of speech data; § 3, p2 – time aligned reference
speech transcripts. task is to identify which words in the transcript are fillers, edits, or fluent; § 4.2, p3 - the number of disfluent tokens in the reference transcript; §5 – STT (Speech-To-Text).).

Claim 12:
	Chotimongkol further teaches or suggests wherein the grammatical structure defined for each sentence represents relation using directed arcs (see Fig. 3-7; para. 0035 - contains a sequence of nodes and arcs representing states and transitions between states; para. 0038 - adding transitions encoding semantic and syntactic categories of words or word sequences to the word graph; para. 0040 - "six dollars" is a monetary amount, so the arc "NE:m" 416E is inserted parallel to that sequence; para. 0043 - transition of F; has the labels of the arcs on the SSG as input and output.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Snover, to include wherein the grammatical structure defined for each sentence represents relation using directed arcs for the purpose of efficiently incorporating lexical, semantic, and syntactic information to improve understanding of spoken language, as taught by Chotimongkol (0012).

Claim 14:
	Chotimongkol further teaches or suggests wherein the operations further comprise: loading the set of tokens into corresponding entries of a data structure (see Fig. 3-8 – para. 0013 - generating a semantic and syntactic graph associated with the received utterance, extracting all n-grams as features from the generated semantic and syntactic graph and classifying the utterance; para. 0038 - formed by adding transitions encoding semantic and syntactic categories of words or word sequences to the word graph. The first additional piece of information is the part of speech tags of the words; para. 0039 - other type of information that is encoded in these graphs is the syntactic parse of each utterance, namely the syntactic phrases with their head words. labels of the transitions for syntactic phrases are prefixed by the token "PHRASE:"; para. 0042 - semantic and syntactic information utilized in this example may comprise part of speech tags, syntactic parses, named entity tags, and semantic role labels- it is anticipated that insertion of further information such as supertags or word stems can also be beneficial for further processing using semantic and syntactic graphs; para. 0046 - The head words of the syntactic phrases and predicate of the arguments are included in the SSGs. This enables the classifier to handle long distance dependences better than using other simpler methods; para. 0062 - features include token-level features (such as the current (head) word, its part of speech tag, base phrase type and position, etc.). 
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Snover, to include wherein the operations further comprise: loading the set of tokens into corresponding entries of a data structure for the purpose of efficiently incorporating lexical, semantic, and syntactic information to improve understanding of spoken language, as taught by Chotimongkol (0012).

Claim 15:
	Chotmongkol further teaches or suggests wherein said labeling comprises populating each entry in the data structure with information related to, or derived from, the dependency parse and the grammatical structure for the sentence with which the corresponding token is associated (see Fig. 3-8 – para. 0013 - generating a semantic and syntactic graph associated with the received utterance, extracting all n-grams as features from the generated semantic and syntactic graph and classifying the utterance; para. 0038 - formed by adding transitions encoding semantic and syntactic categories of words or word sequences to the word graph. The first additional piece of information is the part of speech tags of the words; para. 0039 - other type of information that is encoded in these graphs is the syntactic parse of each utterance, namely the syntactic phrases with their head words. labels of the transitions for syntactic phrases are prefixed by the token "PHRASE:"; para. 0042 - semantic and syntactic information utilized in this example may comprise part of speech tags, syntactic parses, named entity tags, and semantic role labels- it is anticipated that insertion of further information such as supertags or word stems can also be beneficial for further processing using semantic and syntactic graphs; para. 0046 - The head words of the syntactic phrases and predicate of the arguments are included in the SSGs. This enables the classifier to handle long distance dependences better than using other simpler methods; para. 0062 - features include token-level features (such as the current (head) word, its part of speech tag, base phrase type and position, etc.). 
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Snover, to include wherein said labeling comprises populating each entry in the data structure with information related to, or derived from, the dependency parse and the grammatical structure for the sentence with which the corresponding token is associated for the purpose of efficiently incorporating lexical, semantic, and syntactic information to improve understanding of spoken language, as taught by Chotimongkol (0012).
. 
Claim 11 is/are rejected under 35 U.S.C. 103 as being unpatentable Snover, in view of Chotimongkol, in view of Scholz, and further in view of Kahn et al., US Patent Application Publication no. US 2007/0244702 (“Kahn”).
Claim 11:
	Kahn further teaches or suggests forwarding an audio file to a transcription service via an application programming interface; and receiving the transcript from the transcription service via the application programming interface (see para. 0044 – Source input 201 may represent real-time, audio file, or streaming speech input processed by a speech recognition plug-in/ program of session file editor; para. 0046 - input segmented into utterances, may be processed manually or automatically by a pattern recognition program, or both, to produce bounded output data. The result may be one or more session files; para. 0047 - The transcribed session file 205 from transcribe mode may represent audio-aligned text, such as with freeform dictation or structured dictation for data entry, using a speech recognition application that integrates boundary definition 202 and automatic processing 203/204. Speech recognition engine may produce real-time output text in the main read/write window that may be saved as a transcribed session file.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Snover, to include forwarding an audio file to a transcription service via an application programming interface; and receiving the transcript from the transcription service via the application programming interface for the purpose of efficiently transcribing audio data for visualization, processing, and manipulation, improving user identification and interaction, as taught by Kahn (0017 and 0039).

Claims 16-18 is/are rejected under 35 U.S.C. 103 as being unpatentable Snover, in view of Maskey et al., US Patent Application Publication no. US 2008/0046229 (“Maskey”), and further in view of Lewis et al., US Patent no. US 7,010,489 (“Lewis”).
Claim 16:
	Snover teaches or suggests a system comprising: 
	a memory that includes instructions for identifying and then addressing repetitions of a word (see §2, p2 - disfluencies: (i) edits—words that were not intended to be said and that are normally replaced with the intended words, such as repeats, restarts, and revisions.);
	a processor that, upon executing instructions, is configured to:
tokenize text data to produce a set of tokens, wherein each token is representative of a corresponding word in the text data (see §3, p2 - features of each of the words including the lexeme (the word itself), a POS tag for the word; § 4.2, p3 - the number of disfluent tokens in the reference transcript.);
label the set of tokens using a Natural Language Processing (NLP) library (see §1, p1 - disfluencies can be detected using primarily lexical features—specifically the words themselves and part-of-speech (POS) labels; §3, p2 - features of each of the words including the lexeme (the word itself), a POS tag for the word; § 4.2, p3 - the number of disfluent tokens in the reference transcript.);
apply a rule associated with the word to the labeled set of tokens (see Abstract - disfluency detection in speech transcripts using primarily lexical features; 1, p1 - disfluencies can be detected using primarily lexical features—specifically the words themselves and part-of-speech (POS) labels; §2, p2 - disfluency detection is to distinguish fluent from disfluent words. fillers—words with no meaning that are used as discourse markers and pauses, such as “you know” and “um”; §3, p2 - identify which words in the transcript are fillers, edits, or fluent.  rules to relabel words as edits or fillers; § 3, p2 - output of the system is an ordered set of rules, which can then be applied to the test data to annotate it for disfluencies; § 3.2, p2 - representative subset of rule templates chosen by the system. Change the label of: rules 1-10; §4, p2 - whether the word is commonly used as a filler, edit, back-channel word, or is part of a short repeat. three labels (filler, edit and fluent). Learn many subtypes of disfluencies; § 4.2, p3 - 10 rules learned are: rules 1-10. rules were responsible for correcting.);
wherein the rule is representative of a data structure that includes (i) the word (ii) a first criterion that specifies a first occurrence of the word must be immediately followed by a second occurrence of the word for the rule to indicate that a repetition exists; (iii) a second criterion that specifies a certain form of pause/silence must occur before the first occurrence of the word, after the second occurrence of the word, or between the first and second occurrences of the word (see  Abstract - disfluency detection in speech transcripts using primarily lexical features; §1, p1 - disfluencies can be detected using primarily lexical features—specifically the words themselves and part-of-speech (POS) labels; §2, p2 - disfluencies: (i) edits—words that were not intended to be said and that are normally replaced with the intended words, such as repeats, restarts, and revisions; §3, p2 - identify which words in the transcript are fillers, edits, or fluent.  rules to relabel words as edits or fillers. set of possible rules is found by expanding rule templates. output of the system is an ordered set of rules, which can then be applied to the test data to annotate it for disfluencies; §3.1, p2 – whether the word is followed by a silence; § 3.2, p2 - representative subset of rule templates chosen by the system. Change the label of: rules 1-10. 9. word with POS X from L1 to L2 if followed by silence and followed by word with POS Y; §4, p2 - whether the word is commonly used as a filler, edit, back-channel word, or is part of a short repeat. three labels (filler, edit and fluent). Learn many subtypes of disfluencies; § 4.2, p3 - 10 rules learned are: rules 2. Label the left side of a simple repeat as an edit. 7. Label the left side of a simple repeat separated by a filled pause as an edit. rules were responsible for correcting.).
	Snover appears to fail to explicitly disclose repetitions of a phrase; arranged in sequential order; rule associated with the phrase; (i) the phrase, (i) a first criterion that specifies a first occurrence of the phrase must be immediately followed by a second occurrence of the phrase for the rule to indicate that a repetition exists, that the pause/silence is punctuation in relation to the phrase.
Maskey further teaches or suggests repetitions of a phrase; arranged in sequential order; rule associated with the phrase; (i) the phrase, (i) a first criterion that specifies a first occurrence of the phrase must be immediately followed by a second occurrence of the phrase for the rule to indicate that a repetition exists, in relation to the phrase; perform a remediation action for each repetition of the phrase in the text data (see para. 0027 - Three types of disfluencies are detected: repeats (reparandum edited with the same sequence of words); Table 1; para. 0028 - example in Table 1, 'three glasses' is a repeat. Any such occurrences of verbatim repetition of a portion of a spoken utterance are 'repeats'; para. 0029 – is fluency removal may be viewed as a process that transforms the "noisy" disfluent transcript into a "clean" one; para. 0036-0037 - retokenization produces c/ with the same number of words as such that n/. The retokenization of the previous example of repair in Table 1 produces the following parallel text. Noisy Data: I want to buy three glasses no five cups of tea Clean Data: I want to buy REPAIRO REPAIR! FPO five cups of tea; para. 0041 - example in Table 1 the reparandum is "three glasses" -a two word phrase. Reparandum phrase can be of any length; para. 0041 - defining a phrase as a sequence of one or more words; para. 0050 - disfluency translation lattice (305) that translates sequence of noisy words to a clean phrase; Claim 4 - clean speech is output comprising disfluent class label tokens replacing disfluent speech.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Snover, to include repetitions of a phrase; arranged in sequential order; rule associated with the phrase; (i) the phrase, (i) a first criterion that specifies a first occurrence of the phrase must be immediately followed by a second occurrence of the phrase for the rule to indicate that a repetition exists, in relation to the phrase; perform a remediation action for each repetition of the phrase in the text data for the purpose of efficiently identifying disfluencies improving the process of cleaning disfluent speech, as taught by Maskey (0012 and 0013).
Lewis further teaches or suggests that the pause/silence is punctuation in relation to the phrase (see col. 2, lines 53-65 - speech recognition markers in accordance with the inventive arrangement can integrate phrase markers embedded in dictated text. Tokens can include words, phrase markers, punctuation marks and meta-tags; col. 3, lines 39-43 - pausing for a programmatically determined length of time can comprise the step of pausing for a period of time corresponding to a punctuation class selected from the group consisting of: sentence internal markers and sentence final markers; col. 4, lines 6-28 - timing information, specifically, "phrase markers", can be inserted by a speech dictation system during speech dictation. The phrase markers can support ancillary speech dictation features. markers can be inserted when, during a speech dictation session, a speaker pauses at a syntactically appropriate place. identify an appropriate position in the dictated text to insert a pause; col. 6, lines 12-20 - using speech recognition markers. commas, semicolons or periods, than prosody control 32 can interject a pause during prosodic phrasing at each punctuation mark; col. 7, lines 1-15 - time can be linked to both sentence internal markers, like commas and semicolons, and final markers, like periods, exclamation points and question marks.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Snover, to include that the pause/silence is punctuation in relation to the phrase for the purpose of efficiently identifying silence/pause areas in dictated text improving the insertion of punctuation in relation to words, as taught by Lewis (col. 2 and 6).

Claim 17:
Maskey further teaches or suggests wherein the phrase comprises a single word (see para. 0041 - Reparandum phrase can be of any length; para. 0041 - defining a phrase as a sequence of one or more words, single word disfluency are also addressed with such phrase assumption.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Snover, to include wherein the phrase comprises a single word for the purpose of efficiently identifying disfluencies improving addressing also single word disfluencies, as taught by Maskey (0041).


Claim 18:
Maskey further teaches or suggests wherein the remediation action comprises deleting the second occurrence of the phrase from the text data on behalf of an administrator (see para. 0027 - Three types of disfluencies are detected: repeats (reparandum edited with the same sequence of words); Table 1; para. 0028 - example in Table 1, 'three glasses' is a repeat. Any such occurrences of verbatim repetition of a portion of a spoken utterance are 'repeats'; para. 0029 – Disfluency removal may be viewed as a process that transforms the "noisy" disfluent transcript into a "clean" one; para. 0036-0037 - retokenization produces c/ with the same number of words as such that n/. The retokenization of the previous example of repair in Table 1 produces the following parallel text. Noisy Data: I want to buy three glasses no five cups of tea Clean Data: I want to buy REPAIRO REPAIR! FPO five cups of tea; para. 0041 - example in Table 1 the reparandum is "three glasses" -a two word phrase. Reparandum phrase can be of any length; para. 0041 - defining a phrase as a sequence of one or more words; para. 0050 - disfluency translation lattice (305) that translates sequence of noisy words to a clean phrase; Claim 4 - clean speech is output comprising disfluent class label tokens replacing disfluent speech.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Snover, to include wherein the remediation action comprises deleting the second occurrence of the phrase from the text data on behalf of an administrator for the purpose of efficiently identifying disfluencies improving the process of cleaning disfluent speech, as taught by Maskey (0012 and 0013).

Claim 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Snover, in view of Maskey, in view of Lewis, and further in view of Scholz. 
Claim 19:
Maskey further teaches or suggests wherein the remediation action comprises causing in such a manner that the second occurrence of the phrase is distinguishable (see para. 0027 - Three types of disfluencies are detected: repeats (reparandum edited with the same sequence of words); Table 1; para. 0028 - example in Table 1, 'three glasses' is a repeat. Any such occurrences of verbatim repetition of a portion of a spoken utterance are 'repeats'; para. 0029 – Disfluency removal may be viewed as a process that transforms the "noisy" disfluent transcript into a "clean" one; para. 0036-0037 - retokenization produces c/ with the same number of words as such that n/. The retokenization of the previous example of repair in Table 1 produces the following parallel text. Noisy Data: I want to buy three glasses no five cups of tea Clean Data: I want to buy REPAIRO REPAIR! FPO five cups of tea; para. 0041 - example in Table 1 the reparandum is "three glasses" -a two word phrase. Reparandum phrase can be of any length; para. 0041 - defining a phrase as a sequence of one or more words; para. 0050 - disfluency translation lattice (305) that translates sequence of noisy words to a clean phrase; Claim 4 - clean speech is output comprising disfluent class label tokens replacing disfluent speech.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Snover, to include wherein the remediation action comprises causing in such a manner that the second occurrence of the phrase is distinguishable for the purpose of efficiently identifying disfluencies improving the process of cleaning disfluent speech, as taught by Maskey (0012 and 0013).
	Scholz further teaches or suggests causing display of the text data on interface in such a manner that the occurrence is visually distinguishable from other words in the text data (see Fig. 8; para. 0105 - scoring module (60) can further function to identify and highlight (132) filler words (55) in the formatted text (98). The highlight (132) can be depicted by under lineation of the filler words (55); however, this example does not preclude any manner of visually viewable highlight of filler words (55), such as shading, colored shading, encircling, dots, bold lines, or the like. use of different highlight (132) between unclear words (130) and filler words (55); para. 0106 - associate a trigger area (133), as described in working example 2 with each filler word (55).  When the user (26) moves the pointer (28) over the trigger area (133) associated with the filler words (55) in the formatted text (98) the presentation scoring module (60) further operates to depict the filler score image (135). In the instant example, the filler score image (135) indicates that that "this word was identified as a filler word.").
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Snover, to include causing display of the text data on interface in such a manner that the occurrence is visually distinguishable from other words in the text data for the purpose of efficiently emphasizing certain types of words in text to improve a users ability to identify and interact, as taught by Scholz (0105 and 0106).

Response to Arguments
Applicant’s further arguments have been considered but are not persuasive because the arguments do not correspond to the rationales as used in the current rejection.

	
	
	
	
	
	Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Andrew T McIntosh whose telephone number is (571)270-7790. The examiner can normally be reached M-Th 8:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kavita Stanley can be reached on 571-272-8352. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ANDREW T MCINTOSH/Primary Examiner, Art Unit 2176