DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments (11/19/21 Remarks: page 7, line 17 – page 10, line 8) with respect to the rejection of claims 21-40 under 35 USC §103 have been fully considered but they are not persuasive.
With respect to claim 21, Applicant argues (11/19/21 Remarks: page 7, line 17 – page 10, line 5, particularly page 7, line 28 – page 9, line 13) that the art of record does not teach or suggest the amended recitation of a script model representative of a script text and the compliant word variations for the script text.
However, as noted in the claim mapping below, Tsai discloses (Tsai paragraph 0030, constructive concept scripts containing text; Tsai paragraph 0044, selection of best match constructive concept script (inherently from a plurality of options)) a script model representing a script text and a set of variations (i.e. different script texts).
With respect to claims 28 & 35, Applicant argues (11/19/21 Remarks: page 10, lines 5-6) that claims 28 & 35 are allowable for reasons similar to those advanced with respect to claim 21.

With respect to claims 22-27, 29-34, & 36-40, Applicant argues (11/19/21 Remarks: page 10, lines 5-6) that claims 22-27, 29-34, & 36-40 are allowable by their dependence from parent claims 21, 28, & 35.
Applicant’s arguments with respect of claims 21, 28, & 35 are addressed above.
Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 21, 25, 27-28, 31, 34-35 & 38-39 are rejected under 35 U.S.C. 103 as being unpatentable over Rtischev (US 5634086, cited in 2/25/20 Information Disclosure Statement) in view of Sherman (US 20100093319, cited in 2/25/20 Information Disclosure Statement) and Tsai (US 20100100383, cited in 6/30/21 Office Action).
With respect to claim 21, Rtischev discloses:
Claim 21: A method of script identification in audio data, the method comprising:
obtaining audio data (Rtischev Figure 1 item 12, microphone input);
segmenting the audio data into a plurality of utterances (Rtischev column 5, line 47 - column 6, line 5, individual word recognition);
(Rtischev column 5, lines 31-41 and Figure 3, obtain script model in conjunction with a plurality of HMM models, preselected script Figure 3 item 114 implying plurality of script options) and compliant word variations of the script text (see secondary reference below);
decoding the plurality of utterances (Rtischev column 6, lines 12-44 and Figure 3, decoding utterances), wherein decoding the plurality of utterances comprises applying each of the plurality of script models (Rtischev column 6, lines 25-44 and Figure 3, apply script models (various script step points) in response to user audio input to produce corresponding indications) each representing a script text and compliant word variations of the script text to each of the plurality of utterances and producing an indication of which of the script models are identified in the plurality of utterances (see secondary reference below);
for each of the identified script models, determining if the audio data is non-compliant (Rtischev column 7, lines 32-44, detect non-recognized scripted words); and
for each determined non-compliant audio data, initiating at least one remedial action (Rtischev column 7, lines 32-44, end tracking if sufficient scripted words are not recognized) that is selected from a graphical display to present on screen guidance to a customer service agent (see secondary reference below).
Concerning the combination of Rtischev in view of Sherman, Rtischev does not disclose an arrangement for selecting an action based on selection from a display in the case of a non-compliant case determination.

…action that is selected from a graphical display to present on screen guidance to a customer service agent (Sherman paragraph 0004-0005, system for customer service agents, paragraph 0050, graphic user interface selection of an option).
Rtischev and Sherman are combinable because they are from they are from the field of speech and text processing.
Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to apply the Sherman graphic user interface to select the Rtischev remedial action.
The suggestion/motivation for doing so would have been to allow a range of selectable options for responding to a non-compliant case determination.
Concerning the combination of Rtischev in view of Tsai, Rtischev does not disclose decoding the plurality of utterances by applying each of the plurality of script models each representing a script text to each of the plurality of utterances, as noted in Applicant’s arguments (4/22/21 Remarks: page 7, lines 3-26, particularly lines 3-4 & 22-24). Specifically, Rtischev does not disclose the application of each script model where each script model represents an individual script text (as opposed to representing a different stop point in a script text).

…each script model is representative of a script text and compliant word variations of the script text (Tsai paragraph 0030, constructive concept scripts containing text; Tsai paragraph 0044, selection of best match constructive concept script (inherently from a plurality of options))…
…decoding the plurality of utterances comprises applying each of the plurality of script models each representing a script text and compliant word variations of the script text (Tsai paragraph 0030, constructive concept scripts containing text; Tsai paragraph 0044, selection of best match constructive concept script (inherently from a plurality of options)) to each of the plurality of utterances and producing an indication of which of the script models are identified in the plurality of utterances (Tsai paragraphs 0027 & 0039 and Figure 4, application of each constructive concept script to voice data);
Rtischev and Tsai are combinable because they are from they are from the field of speech and text processing.
Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to apply each of a set of scripts each representing a script text to an utterance as taught by Tsai to the script and utterance processing of Rtischev.
The suggestion/motivation for doing so would have been to improve the speed of operation by parallel processing as suggested by Tsai (Tsai paragraph 0031).

Applying these teachings as they are applied to claim 21 above to claims 25, 27-28, 31, 34-35 & 38-39:
Claim 25: The method of claim 21 (see above), wherein the audio data is an instance of an exchange including at least one customer service agent (Sherman paragraph 0004-0005, system for customer service agents).
Claim 27: The method of claim 21 (see above), wherein each script model includes compliant variations of the script text (Rtischev column 5, lines 41-44, multiple models for a given script, each being an accepted variation).
Claim 28: A non-transitory computer readable medium programmed with computer readable code that upon execution by a computer processor (Rtischev column 9, lines 8-10 and column 10, lines 9-12, implementation by computer program and computer workstation) causes the computer processor to:
obtain audio data (Rtischev Figure 1 item 12, microphone input);
segment the audio data into a plurality of utterances (Rtischev column 5, line 47 - column 6, line 5, individual word recognition);
obtain a plurality of script models, wherein each script model is representative of a script text (Rtischev column 5, lines 31-41 and Figure 3, obtain script model in conjunction with a plurality of HMM models, preselected script Figure 3 item 114 implying plurality of script options) and compliant word variations of the script text (Tsai paragraph 0030, constructive concept scripts containing text; Tsai paragraph 0044, selection of best match constructive concept script (inherently from a plurality of options));
decode the plurality of utterances (Rtischev column 6, lines 12-44 and Figure 3, decoding utterances), wherein decoding the plurality of utterances comprises applying each of the plurality of script models each representing a script text (Rtischev column 6, lines 25-44 and Figure 3, apply script models (various script step points) in response to user audio input to produce corresponding indications) and compliant word variations of the script text (Tsai paragraph 0030, constructive concept scripts containing text; Tsai paragraph 0044, selection of best match constructive concept script (inherently from a plurality of options)) to each of the plurality of utterances and producing an indication of which of the script models are identified in the plurality of utterances (Tsai paragraphs 0027 & 0039 and Figure 4, application of each constructive concept script to voice data);
for each of the identified script models, determining if the audio data is non-compliant (Rtischev column 7, lines 32-44, detect non-recognized scripted words); and
for each determined non-compliant audio data, initiating at least one remedial action (Rtischev column 7, lines 32-44, end tracking if sufficient scripted words are not recognized) that is selected from a graphical display to present on screen guidance to a customer service agent (Sherman paragraph 0004-0005, system for customer service agents, paragraph 0050, graphic user interface selection of an option).
Claim 31: The non-transitory computer readable medium of claim 28 (see above), further causing the processor to evaluate a compliance of the audio data with a script requirement threshold by comparing the determined script accuracy to the script requirement threshold (Rtischev column 7, lines 26-31, reject indicator threshold).
Claim 34: The non-transitory computer readable medium of claim 28 (see above), wherein each script model includes compliant variations of the script text (Rtischev column 5, lines 41-44, multiple models for a given script, each being an accepted variation).
Claim 35: A system for identification of a script in audio data, the system comprising:
an audio data source (Rtischev Figure 1 item 12, microphone input);
a script model database comprising a plurality of script models each script model of the plurality representative of at least one script text (Rtischev column 5, lines 31-41 and Figure 3, obtain script model in conjunction with a plurality of HMM models, preselected script Figure 3 item 114 implying plurality of script options) and compliant word variations of the script text (Tsai paragraph 0030, constructive concept scripts containing text; Tsai paragraph 0044, selection of best match constructive concept script (inherently from a plurality of options)); and
a processing system communicatively connected to the script model database and the audio data source (Rtischev column 9, lines 8-10 and column 10, lines 9-12, implementation by computer program and computer workstation), the processing system:
(Rtischev Figure 1 item 12, microphone input),
segments the audio data into a plurality of utterances (Rtischev column 5, line 47 - column 6, line 5, individual word recognition),
obtains a plurality of script models (Rtischev column 5, lines 31-41 and Figure 3, obtain script model in conjunction with a plurality of HMM models, preselected script Figure 3 item 114 implying plurality of script options),
decodes the plurality of utterances (Rtischev column 6, lines 12-44 and Figure 3, decoding utterances), wherein decoding the plurality of utterances comprises applying each of the plurality of script models each representing a script text (Rtischev column 6, lines 25-44 and Figure 3, apply script models (various script step points) in response to user audio input to produce corresponding indications) and compliant word variations of the script text (Tsai paragraph 0030, constructive concept scripts containing text; Tsai paragraph 0044, selection of best match constructive concept script (inherently from a plurality of options)) to each of the plurality of utterances and producing an indication of which of the script models are identified in the plurality of utterances (Tsai paragraphs 0027 & 0039 and Figure 4, application of each constructive concept script to voice data),
for each of the identified script models, determining if the audio data is non-compliant (Rtischev column 7, lines 32-44, detect non-recognized scripted words); and
for each determined non-compliant audio data, initiating at least one remedial action (Rtischev column 7, lines 32-44, end tracking if sufficient scripted words are not recognized) that is selected from a graphical display to present on screen guidance to a customer service agent (Sherman paragraph 0004-0005, system for customer service agents, paragraph 0050, graphic user interface selection of an option).
Claim 38: The system of claim 35 (see above), wherein the processing system further evaluates a compliance of the audio data with a script requirement threshold by comparing the determined script accuracy to the script requirement threshold (Rtischev column 7, lines 26-31, reject indicator threshold).
Claim 39: The system of claim 35 (see above), wherein the audio data is an instance of an exchange including at least one customer service agent (Sherman paragraph 0004-0005, system for customer service agents).
Claims 22-24, 29-30, 32, & 36-37 are rejected under 35 U.S.C. 103 as being unpatentable over Rtischev in view of Sherman and Tsai as applied to claims 21, 28, & 35 above, and further in view of Girardo (US 20020077819, cited in 2/25/20 Information Disclosure Statement).
With respect to claim 22, Rtischev in view of Sherman and Tsai teaches the invention of parent claim 21 as described above.
Rtischev in view of Sherman and Tsai does not expressly disclose the generation of a transcription.
Girardo discloses voice-to-text transcription.

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to apply the transcription of Girardo to the Rtischev in view of Sherman and Tsai user script response arrangement.
The suggestion/motivation for doing so would have been to enable the generation of transcribed records of script interactions for evaluation and reference.
Therefore, it would have been obvious to combine Rtischev in view of Sherman and Tsai with Girardo to obtain the invention as specified in claim 22.
Claim 22: The method of claim 21 (see above), further comprising:
filtering the plurality of utterances to include only utterances attributed to a customer service agent (Girardo paragraph 0060, filtering out extraneous sounds prior to speech processing; Sherman paragraph 0004-0005, system for customer service agents);
extracting acoustic features from the filtered plurality of utterances (Rtischev column 6, lines 12-44 and Figure 3, decoding utterances by speech sound processing); and
using the extracted acoustic features in decoding the plurality of utterances (Rtischev column 6, lines 12-44 and Figure 3, decoding utterances by speech sound processing).

Claim 23: The method of claim 21 (see above), wherein if any of the plurality of script models are determined to have occurred in the audio data, further comprising:
transcribing the utterance containing the script to produce an utterance transcription (Girardo Abstract, speech voice-to-text transcription);
comparing the script text to the utterance transcription (Girardo paragraph 0039, comparison/validation); and
determining a script accuracy (Girardo paragraph 0039, comparison/validation).
Claim 24: The method of claim 23 (see above), further comprising evaluating a compliance of the audio data with a script requirement threshold by comparing the determined script accuracy to the script requirement threshold (Rtischev column 7, lines 26-31, reject indicator threshold).
Claim 29: The non-transitory computer readable medium of claim 28 (see above), further causing the processor to:
filter the plurality of utterances to include only utterances attributed to a customer service agent (Girardo paragraph 0060, filtering out extraneous sounds prior to speech processing; Sherman paragraph 0004-0005, system for customer service agents);
extract acoustic features from the filtered plurality of utterances (Rtischev column 6, lines 12-44 and Figure 3, decoding utterances by speech sound processing); and
(Rtischev column 6, lines 12-44 and Figure 3, decoding utterances by speech sound processing).
Claim 30: The non-transitory computer readable medium of claim 28 (see above), wherein if any of the plurality of script models are determined to have occurred in the audio data, further causing the processor to:
transcribe the utterance containing the script to produce an utterance transcription (Girardo Abstract, speech voice-to-text transcription);
compare the script text to the utterance transcription (Girardo paragraph 0039, comparison/validation); and
determine a script accuracy (Girardo paragraph 0039, comparison/validation).
Claim 32: The non-transitory computer readable medium of claim 28 (see above), wherein the audio data is an instance of an exchange including at least one customer service agent (Sherman paragraph 0004-0005, system for customer service agents).
Claim 36: The system of claim 35 (see above), wherein the processing system further:
filters the plurality of utterances to include only utterances attributed to a customer service agent (Girardo paragraph 0060, filtering out extraneous sounds prior to speech processing; Sherman paragraph 0004-0005, system for customer service agents);
extracts acoustic features from the filtered plurality of utterances (Rtischev column 6, lines 12-44 and Figure 3, decoding utterances by speech sound processing); and
uses the extracted acoustic features in decoding the plurality of utterances (Rtischev column 6, lines 12-44 and Figure 3, decoding utterances by speech sound processing).
Claim 37: The system of claim 35 (see above), wherein if any of the plurality of script models are determined to have occurred in the audio data, the processing system further:
transcribes the utterance containing the script to produce an utterance transcription (Girardo Abstract, speech voice-to-text transcription);
compares the script text to the utterance transcription (Girardo paragraph 0039, comparison/validation); and
determines a script accuracy (Girardo paragraph 0039, comparison/validation).
Claims 26, 33, & 40 are rejected under 35 U.S.C. 103 as being unpatentable over Rtischev in view of Sherman and Tsai as applied to claims 21, 28, & 35 above, and further in view of Aleksic (US 8880398, cited in 2/25/20 Information Disclosure Statement).
With respect to claim 26, Rtischev in view of Sherman and Tsai teaches the invention of parent claim 21 as described above.
Rtischev in view of Sherman and Tsai does not expressly disclose an exchange including a multiword utterance.
Aleksic discloses the processing of multiword phrases (Aleksic column 15, lines 33-38 and column 16, lines 43-58, multiword utterances e.g. “cat and dog”).

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to apply the Rtischev in view of Girardo arrangement to exchanges having multiword utterances.
The suggestion/motivation for doing so would have been to avoid limiting the operation of Rtischev in view of Sherman and Tsai to single-word utterances.
Therefore, it would have been obvious to combine Rtischev in view of Sherman and Tsai with Aleksic to obtain the invention as specified in claim 26.
Claim 26: The method of claim 21 (see above), wherein at least one of the plurality of utterances consists of more than a single word (Aleksic column 15, lines 33-38 and column 16, lines 43-58, multiword utterances e.g. “cat and dog”).
Applying these teachings as they are applied to claim 26 above to claims 33 & 40:
Claim 33: The non-transitory computer readable medium of claim 28 (see above), wherein at least one of the plurality of utterances consists of more than a single word (Aleksic column 15, lines 33-38 and column 16, lines 43-58, multiword utterances e.g. “cat and dog”).
Claim 40: The system of claim 35 (see above), wherein at least one of the plurality of utterances consists of more than a single word (Aleksic column 15, lines 33-38 and column 16, lines 43-58, multiword utterances e.g. “cat and dog”).
Conclusion
Any inquiry concerning the contents of this communication or earlier communications from the examiner should be directed to Stephen M. Brinich at 571-272-7430 (voice) or 571-273-7430 (fax).
Any inquiry relating to the status of this application, entry of papers into this application, or other any inquiries of a general nature concerning application processing should be directed to the Tech Center 2600 Customer Service center at 571-272-2600 or to the USPTO Contact Center at 800-786-9199 or 571-272-1000.
The examiner can normally be reached on weekdays 7:30-4:00 Eastern Time.
If attempts to contact the examiner and the Customer Service Center are unsuccessful, supervisor Claire Wang can be contacted at 571-270-1051.
Hand-carried correspondence may be delivered to the Customer Service Window, located at the Randolph Building, 401 Dulany Street, Alexandria, VA 22314.
/Stephen M Brinich/
Examiner, Art Unit 2663