Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on July 21, 2021 has been entered.
 
Remarks
	This Office Action is in response to applicant’s amendment filed on July 21, 2021, under which claims 1-9 and 11-20 are pending and under consideration.

Response to Arguments
	Applicant’s amendments have overcome the previous § 112(a) rejection of claims 11-12. Therefore, the previous § 112(a) rejection has been withdrawn. 
	Applicant’s amendments have overcome the previous § 103 rejection of claims 1-7, 9, and 13-17. However, upon further consideration, a new ground of rejection based on Raghavan in view of Vig and Suhm has made for these claims, as set forth below. 
Applicant’s arguments directed to the § 103 rejection are partially moot under the new ground of rejection.
wherein the waypoint comprises metadata of a communication for summarizing, categorizing, labeling, classifying, or annotating sections of the communication” newly recited in the independent claims. The Examiner respectfully disagrees. As stated in the rejection below, in Raghavan, the IVR state labels (e.g., “welcome”, “main menu”, “payment option”) also constitute “metadata…for categorizing, labeling, classifying or annotating.” Since “metadata” is data that describes some other data, a particular IVR state label satisfies the definition of “metadata…for labeling” because it describes the type of IVR state for a particular prompt. The current claim language does not require a more precise definition of the recited “metadata” than what is disclosed in Raghavan.
	Applicant’s arguments with respect to the other limitations newly incorporated in the independent claims are moot under the new ground of rejection because newly cited references Vig and Suhm are relied upon for these limitations. 

Claim Objections
Claims 13 and 17 are objected to because of the following informalities:  
In the last two subparagraphs of claims 13 and 17, “receiving” and “moving” should be “receive” and “move”, respectively.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

1.	Claims 1-7, 9, 11-17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Raghavan et al. (US 8,761,373 B1) (“Raghavan”) in view of Vig et al. (US 2018/0113854 A1) (“Vig”) and Suhm et al., “Call Browser: A System to Improve the Caller Experience by Analyzing Live Calls End-to-End.” CHI 2009, April 3–9, 2009, Boston, MA, USA (“Suhm”).
As to claim 1, Raghavan teaches a computer-implemented method, comprising:
receiving first communications; [Col. 3, line 37: “recording entire IVR calls in stereo format”; FIG. 3, step 310: “record entire IVR conversation in stereo format”]
determining first segments of the first communications by segmenting the first communications [Col. 3, lines 38-41: “extracting the speech segment from the audio recording of the IVR portion… and converting the extracted speech segments to text. The system next takes the prompts and ‘pre-processes them’”; see also FIG. 3, steps 320-340. Note that the “prompts” referred to above is used in the meaning of segments of communication, as described in, e.g., col. 3, line 2 (“extracted speech segments are the IVR prompts in the call”) and col. 4, lines 42-43 “representative segment (i.e., IVR prompt)”), and correspond to the “first segments” recited in the instant claim.];
determining clusters of the first segments by evaluating similarity among the first segments; [col. 3, lines 53-55: “The pre-processed prompts are then clustered together into groups based on semantic meaning, such that prompts with similar meaning or intent are grouped together (step 350)”; see also col. 2, lines 7-8: “A hierarchical, bisecting K-Means algorithm may be used to perform the clustering”]
receiving waypoint classifications for a subset of the clusters, wherein a waypoint classification identifies that a cluster is a waypoint, [Col. 4, lines 7-12: “After clustering the prompts into groups, the system preliminarily labels each group with an IVR state (step 355). The developer then has the opportunity to view the labeled clusters on a user interface (see e.g., FIG. 5) and approve or edit the clusters and assigned labels 360. Finally, the semantic classifier is built based on the labeled clusters 370.” With respect to the limitation of “waypoint classification,” the Examiner notes that the abstract of the instant application refers to “waypoints” as “e.g., portions of the communications of particular relevance to a user training the classifier.” Therefore, a label indicating a particular IVR state in Raghavan reads on the limitation of “waypoint classification” (identifying that a cluster is a waypoint) of the instant claim, since they are portions of the communications of particular relevance to a user training a classifier. Col. 3, lines 10-12 provide examples of IVR states, such as “welcome”, “main menu” and “payment option,” which are similar to examples in Table 1 in the instant application, such as “agent greeting” and “screen navigation.” Additional examples of labels are shown in FIG. 5 (column under “prompt label”). Additionally, FIG. 6 (described in col. 5, lines 11-16) show sequences of interconnected IVR states, thereby teaching that IVR states read on the concept of waypoints.] and wherein the waypoint comprises metadata of a communication for summarizing, categorizing, labeling, classifying, or annotating sections of the communication; [The IVR state labels mentioned above (e.g., “welcome”, “main menu”, “payment option”) also constitute “metadata…for categorizing, labeling, classifying or annotating.” Since “metadata” is data that describes some other data, a particular IVR state label satisfies the definition of “metadata…for labeling” because it describes the type of IVR state for a particular prompt (a label would also satisfy the recitation of for categorizing, classifying or annotating, which are also functions of the labels). The instant claim does not require a more precise definition of the recited “metadata” than the labels. Therefore, Raghavan’s teaching of a labeled prompt reads on the claim term “waypoint comprises metadata.” In regards to the claim language of the waypoint “comprising” the metadata, the Examiner notes that the combination of a label together with the concept of the state described by the label corresponds to a “waypoint comprising metadata.”]
generating a machine learning classifier to identify waypoints in new communications by training the machine learning classifier from the classifications; [Col. 4, line 14: “the semantic classifier is built based on the labeled clusters”; FIG. 3, step 370 (“build semantic classifier based on labeled clusters”). The building of the semantic classifier completes the “training” process that began with the clustering. With respect to the limitation of “machine learning,” the semantic classifier in Raghavan is built based on clustering, which is unsupervised learning because the system learns the clusters via a clustering algorithm. Since Raghavan’s classifier is built using an unsupervised machine learning algorithms, it is considered to be a “machine learning” classifier. Furthermore, since the user can edit the labels, Raghavan’s method may also be regarded as semi-supervised learning. With respect to the limitation of “training,” since Raghavan’s classifier is a machine learning classifier, “training” is implied by the description that “the semantic classifier is built based on the labeled clusters.” The semantic classifier in Raghavan is built to perform classification; thus, the classifier can be considered to be “trained” to perform the classification task according to the clustering. The classifier can also be considered to be “trained” in the sense that it is built to perform classification in accordance with some training data. The Examiner notes that the instant claim does not require further details as to the process of “training the machine learning classifier.”] 
receiving a second communication; [Col. 2, lines 52-53: “The system first…obtain an audio recording of the call within the IVR system”; FIG. 2, step 210 (“obtain audio recording of IVR system for call”)]
determining second segments of the second communication; [Col. 2, line 65 to col. 3, line 5: “Next, the speech segments from the audio recordings of the IVR portion are extracted from the audio files by identifying the speech portions… The extracted speech segments are the IVR prompts in the call. The extracted speech segments are then converted into text using a transcription engine that can automatically transcribe the IVR prompts (step 225).”] 
determining one or more waypoints for the second communication by inputting the second segments into the machine learning classifier; [Col. 3, lines 8-10: “Each text segment is then automatically classified with one of a plurality of predefined IVR states (step 230).” Note that as described in col. 3, lines 29-31, the method of FIG. 2 (which includes step 230) utilizes a “semantic classifier,” and this semantic classifier is constructed in accordance with the method of FIG. 3, as described in col. 3, lines 33-35. Therefore, Raghavan teaches “inputting the second segments into the machine learning classifier” that was trained under the foregoing steps.]
receiving a selection of a first waypoint of the one or more waypoints for (a) communication; [Raghavan, col. 4, lines 45-48 (referring to FIG. 5): “The Clustered Transcript view also has a Prompt Label section 520, which lists the label assigned to each of the clusters. As described above, the user can manually edit the labels.” See also Raghavan, col. 4, line 60 to col. 5, line 3: “…This allows the user to review the audio file, click on the representative segment from among the Transcript section 515 and edit the representative segment, as necessary….When a particular representative segment, its prompt label, and prompt type have been reviewed and approved, the developer may press the “Done” button 560 and an indication, such as a green check mark, will be displayed corresponding to the highlighted row.” Since the user can edit the labels using the graphical user interface shown in FIG. 5 and the selection of the highlighted row, it is implicitly disclosed that there is a selection of the label to be edited, and receipt of the selection.] and
moving a first cursor in a first display of a text transcript of (the) communication to a portion of the first display of the text transcript of (the) communication that corresponds to the first waypoint. [FIG. 5 of Raghavan illustrates a “text transcript,” as shown near reference marker 515 and described in col. 4, lines 41-44: “The Clustered Transcript view 500 has a Transcript section 515, which lists the representative segment (i.e., IVR prompt) from each cluster.” For example, highlighted line in FIG. 5 reads “oh i checked your account and as of this morning your payment…”which is a display of a text transcript of a communication. Furthermore, FIG. 5 teaches a “first cursors” in the form of a “highlighted row” (as described in col. 5, lines 2-3). The portion of the highlighted row in the “transcript” column constitutes a cursor of display of the text transcript the label representation of the second communication. This portion of the highlighted row (first cursor) is positioned to a portion (e.g., the label “play balance” as shown in FIG. 5) of the label representation. The Examiner also notes that the claim does not require the “first cursor” to have a particular form, nor does it require the cursor to have a certain degree of precision in identifying something. Therefore, the row highlight disclosed in Raghavan reads on the limitation of “first cursor.”].
Raghavan does not explicitly teach the following: 
(1)	The first communications and the second communication comprise “a first dialog between persons” and “a second dialog between persons,” respectively.
(2) 	The segmenting for the first communications and the segmenting for the second communication are performed “using at least first temporal features and first lexical features that span one or more dialog turns associated with the first communications” and “using at least second temporal features and second lexical features that span one or more dialog turns associated with the second communication,” respectively.
(3)	The limitation that the “receiving a selection” and “moving a first cursor” operations are performed for the “second communication.” [Note: Raghavan generally teaches the use of the interface of FIG. 5, which is used to review and edit transcripts (col. 4, lines 55-56), for reviewing clusters prior to building of the classifier, but does not explicitly teach the use of the interface for communications after the classifiers has been built.]
Vig, in an analogous art, teaches the above limitations (1) and (2). Vig relates to “automatic extraction of structure from spoken conversation using lexical and acoustic features” (title), with the use of a trained “machine learning model” ([0042]). In general, Vig teaches the problem of “extracting conversational structure from customer service phone calls” ([0004]). The extraction of conversational structure includes segmentation, as described in, e.g., [0028]: “the system may partition transcript 102 into such phases 106 based on analyzing extracted acoustic features 104 together with transcript 102.” Note that segments are given labels describing the coarse activities (see FIG. 1, item 106, as described in [0027]-[0028]). Therefore, Vig is in the same field of endeavor as the claimed invention and addresses problems similar to those of the present application. 
In particular, Vig teaches or suggests the first communications and the second communication respectively “comprising a first dialog between persons” and “comprising a second dialog between persons” [[0058]: “each turn in the dialog is spoken by a customer or an agent.” That is, the customer and agent constitute a plurality of persons. See also [0026]: “the speakers' voices” (i.e., two persons); claim 10: “classifying the voice record into at least three sequential utterances spoken by two different speakers”; and FIG. 1, item 102 (dialogue between two speakers) as described in [0026].]. Vig further teaches segmenting “using at least first temporal features and first lexical features that span one or more dialog turns associated with the first communications” and “using at least second temporal features and second lexical features that span one or more dialog turns associated with the second communication” [[0025]: “automatically extracting conversational structure from a voice record by combining extracted lexical and acoustic features…The system may infer a coarse-level conversational structure based on fine-level activities identified from extracted acoustic features. The system can improve significantly over previous systems by extracting conversational structure based on a combination of lexical and acoustic features.” With respect to the limitation of “temporal features,” [0009] teaches “the extracted acoustic feature may include one or more of: … timing or length of an utterance; timing of silence or pauses; … speaking rhythm; speaking rate” (i.e., the “acoustic features” described in [0009] include temporal features). With respect to the limitation of “turn”, [0026] teaches: “A respective uninterrupted segment of the conversation spoken by a single speaker will be referred to as an utterance or turn.” Note that FIG. 1 shows an example of two speakers speaking in turns, as described in [0028] quoted above.]
The above teachings of Vig are applicable to both the “first” (training-stage) and “second” (inference-stage) set of features of the instant claim limitation, particularly since Vig also has training and inference phases, as taught in [0042]: “the system may first extract features for machine learning (operation 502), as described further below. The system may then train a machine learning model (operation 504), as described further below. The system may then predict conversational structure (operation 506).” 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Raghavan and Vig by modifying the method of Vig such that the first communications and the second communication comprise “a first dialog between persons” and “a second dialog between persons” respectively; and the segmenting for the first communications and the segmenting for the second communication are performed “using at least first temporal features and first lexical features that span one or more dialog turns associated with the first communications” and “using at least second temporal features and second lexical features that span one or more dialog turns associated with the second communication,” respectively. The motivation for doing so would have been to analyze customer service calls between a customer and an agent (see Vig, [0002]: “organization or business engaging in customer service phone calls may also wish to understand or analyze”), and to analyze such calls by using features that result in improved performance in extracting conversational structures (see Vig, [0025]: “improve significantly over previous systems by extracting conversational structure based on a combination of lexical and acoustic features”).
Suhm, in an analogous art, teaches the remaining limitations (3) listed above. Suhm teaches a “Call Browser” that “provides access to hundreds or thousands of live end-to-end calls, and empowers usability practitioners and call-center analysts to systematically and efficiently evaluate the caller experience and identify usability issues” (abstract). Therefore, Suhm is in the same field of endeavor as the claimed invention, namely contact center analysis systems.
In particular, Suhm teaches user review of the “second communication” [page 4, right column, first paragraph: “Second, analytic models implemented as finite-state automata automatically classify calls along various dimensions,…, speech-to-text software transforms the entire call into searchable text. Finally, human annotators can listen to selected call recordings and insert manual annotations to characterize the caller’s interaction with the IVR or analyze agent dialogs in detail.” The “listening” operation is shown in FIG. 2 as described on page 5 (heading “Listening Screen”), which teaches that the “full transcript that was generated by a large vocabulary speech recognizer is accessible” (last paragraph). That is, Suhm teaches user review of communication that was automatically classified, which is analogous to the “second communication” that is also automatically classified.]
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Raghavan and Vig with the teachings of Suhm by performing the “receiving” and “moving” operations of the instant claim with respect to the second communication, so as to arrive at all limitations of the instant claim. The motivation would have been to enable a user to analyze a call that has been classified automatically, as suggested by Suhm (page 4, right column, first paragraph, parts quoted above; and page 5, second paragraph: “allows analysts to listen to an individual call recording and to further analyze the call.”). 

As to claim 2, the combination of Raghavan, Vig, and Suhm teaches the computer-implemented method of claim 1, further comprising:
determining that a communication of the first communications includes audio data; [Raghavan, col. 3, lines 38-41: “extracting the speech segment from the audio recording of the IVR portion.”] and
transcribing speech included in the audio data to text. [Raghavan, col. 3, lines 38-41: “…and converting the extracted speech segments to text.” See also Raghavan, col. 3, lines 3-5: “The extracted speech segments are then converted into text using a transcription engine that can automatically transcribe the IVR prompts (step 225).” It is understood that these features, described for the deployment process of FIG. 2, would also apply to the training process shown in FIG. 3 (see, e.g., col. 4, lines 34-54, which describes transcribed text for the training set).]

As to claim 3, the combination of Raghavan, Vig, and Suhm teaches the computer-implemented method of claim 1, further comprising:
receiving a third communication [The process shown in FIG. 2 of Raghavan, which was cited for the “second communication” in claim 1, also applies to other communications, e.g., a “third communication.”] that is one of a voicemail, a video, an e-mail, a live chat transcript, or a text message [Raghavan, col. 3, lines 38-41: “extracting the speech segment from the audio recording of the IVR portion and converting the extracted speech segments to text.” See also col. 3, lines 3-5, teaching the use of a transcription engine. The transcribed speech segments constitute a text message to the extent required by the instant claim. Note that the claim does not require an operation of segmenting the text message into constituent portions.]
determining third segments of the third communication; [The transcribed text segments, as noted above, correspond to “third segments” of the third communication.] and
determining one or more second waypoints for the third communication. [Raghavan, col. 3, lines 8-10: “Each text segment is then automatically classified with one of a plurality of predefined IVR states (step 230).”]

As to claim 4, the combination of Raghavan, Vig, and Suhm teaches the computer-implemented method of claim 3, wherein determining the one or more second waypoints includes:
inputting the third segments into a second machine learning classifier trained using third communications of a type of communication associated with the one of the voicemail, the video, the e-mail, the live chat transcript, or the text message. [Raghavan, col. 3, lines 8-10: “Each text segment is then automatically classified with one of a plurality of predefined IVR states (step 230).” Note that as described in col. 3, lines 29-31, the method of FIG. 2 (which includes step 230) utilizes a “semantic classifier,” and this semantic classifier is constructed in accordance with the method of FIG. 3, as described in col. 3, lines 33-35. Therefore, Raghavan teaches “inputting the second segments into the machine learning classifier” that was trained under the foregoing steps.]

As to claim 5, the combination of Raghavan, Vig, and Suhm teaches the computer-implemented method of claim 1, further comprising:
providing a user interface for classifying or not classifying the clusters on a per cluster basis. [Raghavan, col. 4, lines 9-11: “The developer then has the opportunity to view the labeled clusters on a user interface (see e.g., FIG. 5) and approve or edit the clusters and assigned labels 360.” Raghavan, col. 4, lines, 45-48: “The Clustered Transcript view also has a Prompt Label section 520, which lists the label assigned to each of the clusters. As described above, the user can manually edit the labels.” Therefore, Raghavan teaches the limitation of “for classifying on a per cluster basis.” Moreover, Raghavan is also deemed to teach the alternate limitation of “or not classifying,” since the user may choose not to change the labels.]

As to claim 6, the combination of Raghavan, Vig, and Suhm teaches the computer-implemented method of claim 1, further comprising:
extracting at least one of semantic features, syntactic features, prosodic features, or user features associated with the first communications, [Raghavan, col. 3, lines 47-52, teaches “user features”: “Furthermore, a set of prompts may be similar except for having unique customer information, Such as names or numbers. In this situation, the system would “normalize' or remove the unique text and replace the text with a static value such that the similar set of prompts may be clustered together for increased efficiency.” Alternatively, the acoustic features of Vig cited in the rejection of claim 1, above, also constitute prosodic features. See Vig, [0037] (“speaking pitch, speaking intensity”); [0026] (“prosodic analysis to extract acoustic features 104, such as silence, utterance length, speaking pitch, intonation, and other features, from the conversation.”)]
wherein the first segments are further determined based on at least one of the semantic features, the syntactic features, the prosodic features, or the user features associated with the first communications. [Raghavan, col. 3, lines 40-42, teaches that the above operations are part of a “pre-process” step (step 340 in FIG, 3) that is performed to determine the final segments for training the classifier. Therefore, Raghavan teaches the instant limitation of “further determined based on.” Alternatively, Vig teaches the instant limitation, because the features taught in Vig serve as the basis for segmentation, as described in the rejection of claim 1.] 

As to claim 7, the combination of Raghavan, Vig, and Suhm teaches the computer-implemented method of claim 1, wherein determining the clusters includes:
applying a clustering algorithm to the first segments using partitional clustering, hierarchical clustering, density-based clustering, or grid-based clustering. [Raghavan col. 4, lines 1-3: “In one embodiment, the IVR prompts are grouped into clusters using a hierarchical, bisecting K-Means clustering algorithm.” That is, Raghavan teaches hierarchical clustering and partitional clustering.].

As to claim 9, the combination of Raghavan, Vig, and Suhm teaches the computer-implemented method of claim 1, wherein the machine learning classifier is associated with a machine learning classification algorithm from a group comprising a nearest neighbor algorithm, a boosting algorithm, a statistical algorithm, a neural network, a random forest, and a support vector machine. [Raghavan, col. 3, lines 33-36: “building a semantic statistical model for the semantic classifier to enable the semantic classifier to automatically classify IVR prompts with an IVR state.” That is, the classifier of Raghavan uses a “statistical semantic model,” corresponding to the limitation of “a statistical algorithm” recited in the list of alternatives. Furthermore, Raghavan teaches “building a statistical semantic model based on the labeled clusters” (see Raghavan, claim 1) and the statistical semantic model is a “machine learning classification algorithm,” as discussed in the rejection of claim 1.] 

As to claim 11, the combination of Raghavan, Vig, and Suhm teaches the computer-implemented method of claim 1.
Suhm further teaches “moving a second cursor of a second display of an audio wave representation of the second communication to a portion of the second display of the audio wave representation of the second communication that corresponds to the first waypoint.” [FIG. 2 (page 3), as described on page 5, left column, second paragraph (“Listening Screen”): “The synchronized waveform and call event list below the waveform make it easy to navigate the audio, helping to minimize the amount of time users spend listening. Color coding indicates IVR, queue and agent segments of a call, and listeners can jump to any point in the audio just by clicking on the waveform or an event. Key data summarizing the call is displayed on the left hand side below the waveform, including when the call began, duration.” Referring to FIG. 2, the “listening cursor” labeled in this figure reads on the limitation of a second cursor for an audio wave representation. This cursor moves in the audio wave (labeled as “waveform” in FIG. 2) and is synchronized with the call even list (labeled as “event sequence”), which also has a cursor in the form of a highlighted row. Moreover, the “event sequence” is analogous to the IVR list taught in Raghavan.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the thus-far combination of Raghavan, Vig, and Suhm with the above further teachings of Suhm by implementing the operation of “moving a second cursor of a second display of an audio wave representation of the second communication to a portion of the second display of the audio wave representation of the second communication that corresponds to the first waypoint.” The motivation would have been to implement an interface that enables “analysts and usability practitioners [to] access and analyze end-to-end call data” (Suhm, first page, last paragraph), particularly in a manner that “make[s] it easy to navigate the audio” together with a call event list (Suhm, page 5, left column, second paragraph).

As to claim 12, the combination of Raghavan, Vig, and Suhm teaches the computer-implemented method of claim 11, as set forth in the rejection of claim 11, further comprising
moving a third cursor of a third display of an event list of the second communication to a portion of the third display of the event list of the second communication that corresponds to the first waypoint. [Raghavan, FIG. 5 teaches an event list in the form of prompt labels. The portion of the highlighted row in the “Prompt Label” column constitutes a cursor of display of the event list of the communication. The part of the event list that is highlighted corresponds to the selected prompt label (waypoint). This limitation is alternatively taught by Suhm, which, as described above, teaches that the audio wave and is synchronized with the call even list (labeled as “event sequence” in FIG. 2), which also has a cursor in the form of a highlighted entry. See FIG. 2 caption of Suhm, which states “The event sequence, shown in the middle below the waveform, covers the caller’s interaction with the IVR and agent-caller dialogs.”]

As to claim 13, this claim is directed to a computer system for performing operations that are the same or substantially the same as those recited in claim 1. Therefore, the rejection made to claim 1 is applied to claim 13.
Furthermore, Raghavan teaches a computing system, comprising:
one or more processors; [Raghavan, col. 5, lines 17-24: “one or more processors”]
memory including instructions that, upon execution by the one or more processors, cause the computing system to… [Raghavan, col. 5, lines 17-24: “a computer system has a memory or other physical storage medium for storing Software instructions and one or more processors for executing the software instructions.”]

As to claim 14, the combination of Raghavan, Vig, and Suhm teaches the computing system of claim 13, wherein the instructions upon execution further cause the computing system to:
provide a user interface for classifying or not classifying the clusters on a per communication basis. [Raghavan, col. 4, lines 9-11: “The developer then has the opportunity to view the labeled clusters on a user interface (see e.g., FIG. 5) and approve or edit the clusters and assigned labels 360.” Raghavan, col. 4, lines 45-48: “The Clustered Transcript view also has a Prompt Label section 520, which lists the label assigned to each of the clusters. As described above, the user can manually edit the labels.” With respect to the limitation of “on a per communication basis,” Raghavan, col. 4, lines 66-67 teaches: “When a particular representative segment, its prompt label, and prompt type have been reviewed and approved.” Therefore, since users can edit a prompt label of a particular representative segment, Raghavan is considered to teach the instant limitation. Moreover, Raghavan is also deemed to teach the alternate limitation of “or not classifying,” since the user may choose not to change the label.]

As to claim 15, the combination of Raghavan, Vig, and Suhm teaches the computing system of claim 13, wherein the instructions upon execution further cause the computing system to:
receive a third communication; [Raghavan, FIG. 2, step 210: “obtain audio recording of IVR system for call,” as described in col. 2, lines 52-53. Note that this process is generic, and applies to further communications.] associated with a second context that differs from a first context associated with the first communications and the second communication [Raghavan, col. 1, lines 34-42 generally teaches that an “IVR state sequence is identified for each of a plurality of calls from recorded audio files of the calls.” Since this part of Raghavan refers to “most common state sequences,” it is understood that different calls may have different state sequences, which corresponds different contexts.]
determine one or more second waypoints for the third communication by inputting the third communication into a second machine learning classifier trained using communications associated with the second context. [Raghavan generally teaches, in col. 3, lines 8-10: “Each text segment is then automatically classified with one of a plurality of predefined IVR states (step 230).” Note that as described in col. 3, lines 29-31, the method of FIG. 2 (which includes step 230) utilizes a “semantic classifier,” and this semantic classifier is constructed in accordance with the method of FIG. 3, as described in col. 3, lines 33-35. With respect the particular limitation of “a second machine learning classifier trained using communications associated with the second context,” Raghavan, col. 3, lines 30-32 teaches that “the semantic classifier is updated to improve the ability of the classifier to classify future prompts with a predefined IVR state.” Since the semantic classifier may be updated to classify future prompts based on a previous prompt (corresponding to communications associated with a second context), the updated classifier is considered to correspond to a “second machine learning classifier.”]

As to claim 16, the combination of Raghavan, Vig, and Suhm teaches the computing system of claim 15, wherein the second context differs from the first context based on at least one of a type of a communication, a business department associated with a communication, a product associated with a communication, a language associated with a communication, or an a/b testing group associated with a communication. [Since Raghavan, col. 3, lines 8-10, teaches calls with different IVR sequences, as explained in the rejection of claim 15, and different IVR sequences can be considered to be different types of communication, to the extent required by the claim, Raghavan teaches “differs from the first context based on…a type of a communication.”]

As to claim 17, this claim is directed to a non-transitory computer-readable storage medium including instructions for performing operations that are the same or substantially the same as those recited in claim 1. Therefore, the rejection made to claim 1 is applied to claim 17.
Additionally, Raghavan teaches a non-transitory computer-readable storage medium including instructions that, upon execution by one or more processors of a computing system, cause the computing system to… [Col. 5, lines 17-24: “a computer system has a memory or other physical storage medium for storing Software instructions and one or more processors for executing the software instructions.”]

As to claim 20, the combination of Raghavan, Vig, and Suhm teaches the non-transitory computer-readable storage medium of claim 17, as set forth in the rejection of claim 17, above, wherein the instructions upon execution further cause the computing system to:
receive a selection of a first waypoint of the one or more waypoints for the second communication; [The combination of references teaches this limitation for the same reason that it teaches “receiving a selection of a first waypoint of the one or more waypoints for the second communication” as recited in the parent claim. Note that the claim language is substantially the same.]
move a first cursor of a first representation of second communication to a portion of the first representation of the second communication that corresponds to the first waypoint; [The combination of references teaches this limitation for the same reason that it teaches “moving a first cursor in a first display of a text transcript of the second communication to a portion of the first display of the text transcript of the second communication that corresponds to the first waypoint.” Note that the instant claim language is merely a broader version of the claim language in the parent claim, since a “first representation” may be a text transcript].
move a second cursor of a second representation of the second communication to a portion of the second representation of the second communication that corresponds to the first waypoint; [Raghavan, FIG. 5, illustrating a “highlighted row” (as described col. 5, lines 2-3). The portion of the highlighted row in the “prompt label” column constitutes a cursor of the label representation of the second communication. This portion of the highlighted row (first cursor) is positioned to a portion (e.g., the label “play balance” as shown in FIG. 5) of the label representation. The limitation of performing these operations with respect to the “second communication” is taught by the combination of references, since Suhm teaches user review of the “second communication,” as set forth in connection with the rejection of the parent claim.] 
The thus-far combination of references does not teach the remaining limitation of “moving a third cursor of a third representation of the second communication to a portion of the third representation of the second communication that corresponds to the first waypoint.”
Suhm further teaches “moving a third cursor of a third representation of the second communication to a portion of the third representation of the second communication that corresponds to the first waypoint.” [FIG. 2 (page 3), as described on page 5, left column, second paragraph (“Listening Screen”): “The synchronized waveform and call event list below the waveform make it easy to navigate the audio, helping to minimize the amount of time users spend listening. Color coding indicates IVR, queue and agent segments of a call, and listeners can jump to any point in the audio just by clicking on the waveform or an event. Key data summarizing the call is displayed on the left hand side below the waveform, including when the call began, duration.” Referring to FIG. 2, the “listening cursor” labeled in this figure reads on the limitation of a second cursor for an audio wave representation. This cursor moves in the audio wave (labeled as “waveform” in FIG. 2) and is synchronized with the call even list (labeled as “event sequence”), which also has a cursor in the form of a highlighted row. Moreover, the “event sequence” is analogous to the IVR list taught in Raghavan. Note that a wave form representation as taught in Suhm is considered to be a “third representation.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the thus-far combination of Raghavan, Vig, and Suhm with the above further teachings of Suhm by implementing the operation of “moving a third cursor of a third representation of the second communication to a portion of the third representation of the second communication that corresponds to the first waypoint.” The motivation would have been to implement an interface that enables “analysts and usability practitioners [to] access and analyze end-to-end call data” (Suhm, first page, last paragraph), particularly in a manner that “make[s] it easy to navigate the audio” together with a call event list (Suhm, page 5, left column, second paragraph).

2. 	Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Raghavan in view of Vig and Suhm, and further in view of Basu et al., “Semi-supervised Clustering by Seeding,” Proceedings of the 19th International Conference on Machine Learning (ICML-2002), pp. 19-26, Sydney, Australia, July 2002 (“Basu”).
As to claim 8, the combination of Raghavan, Vig, and Suhm teaches the computer-implemented method of claim 7, but does not teach the method further comprising “seeding the clustering algorithm using one or more predetermined cluster examples.”
Basu, in an analogous art, teaches the above limitations. Basu generally relates to clustering techniques for machine learning (see abstract). Therefore, Basu is the same field of endeavor as the claimed invention, and would also be reasonably pertinent to the problems being solved by the present application.
In particular, Basu teaches “seeding the clustering algorithm using one or more predetermined cluster examples” [Abstract: “Semi-supervised clustering uses a small amount of labeled data to aid and bias the clustering of unlabeled data. This paper examines the use of labeled data to generate initial seed clusters”; page 1, right column, top paragraph: “labeled data to generate seed clusters that initialize a clustering algorithm.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Raghavan, Vig, and Suhm with the teachings of Basu by performing the additional operation of seeding the clustering algorithm using one or more predetermined cluster examples, in order to bias the clustering toward a good region of the search space and produce clusters similar to user-specified classifications (§ 1, paragraph 1, last sentence: “Proper seeding biases clustering towards a good region of the search space, thereby reducing the chances of it getting stuck in poor local optima, while simultaneously producing a cluster similar to the user-specified labels”).

3. 	Claims 18-19 is rejected under 35 U.S.C. 103 as being unpatentable over Raghavan in view of Vig and Suhm, and further in view of Duta (US 8,515,736 B1).
As to claim 18, the combination of Raghavan, Vig, and Suhm teaches the non-transitory computer-readable storage medium of claim 17, wherein the instructions upon execution further cause the computing system to:
input the clusters into (a classifier) that generates the waypoint classifications for the subset of the clusters. [Col. 4 lines 7-8: “After clustering the prompts into groups, the system preliminarily labels each group with an IVR state (step 355).”]
Duta, in an analogous art, teaches “a second machine learning classifier.” Duta generally pertains to natural language understanding and text classification for voice controlled systems (col. 1, lines 8-10). Therefore, Duta is in the same field of endeavor as the claimed invention.
In particular teaches a “second machine learning classifier” [Col. 16, lines 38-51: “In step 720, training manager 1340 semantically labels the first utterance using a second automatic semantic classifier. The second automatic semantic classifier is a previously trained semantic classifier… the second automatic semantic classifier is an existing semantic classifier, meaning a semantic classifier already trained from textual utterances.” See also col. 17, lines 33-Note that step 720 occurs in the context of training a first machine learning classifier as show in FIG. 17 (see col. 16, lines 8-10: “In step 710, training manager 1340 receives a first utterance for use in training a first automatic semantic classifier” and claim 1 of Duta). Therefore, the role of the “second automatic semantic classifier” (previously trained semantic classifier) is to label data that is used to train another classifier, similar to the function of the second machine learning classifier in the instant claim.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Raghavan, Vig, and Suhm with the teachings of Duta by modifying the instructions such that upon execution thereof, the system inputs the clusters into second machine learning classifier that generates the waypoint classifications for the subset of the clusters, in order to utilize an existing classifier that is suitable for labeling text to be used to train another classifier, as suggested by Duta (see col. 16, lines 38-51, particularly the parts quoted above.).

As to claim 19, the combination of Raghavan, Vig, Suhm, and Duta teaches the non-transitory computer-readable storage medium of claim 18, wherein the instructions upon execution further cause the computing system to:
provide a user interface for editing the waypoint classifications. [Raghavan, col. 4, lines 9-11: “The developer then has the opportunity to view the labeled clusters on a user interface (see e.g., FIG. 5) and approve or edit the clusters and assigned labels 360.” Raghavan, col. 4, lines, 45-48: “The Clustered Transcript view also has a Prompt Label section 520, which lists the label assigned to each of the clusters. As described above, the user can manually edit the labels.”]

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The following prior art depicts the state of the art.
US20120173229A1 teaches interfaces for presenting call information, with a waveform representation together with an event list representation.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to YAO DAVID HUANG whose telephone number is (571)270-1764. The examiner can normally be reached Monday - Friday 8:30 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Y.D.H./Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/            Supervisory Patent Examiner, Art Unit 2124