DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement(s) (IDS) submitted on November 8, 2020 is/are being considered by the examiner.

Drawings
The drawings are objected to because of the following informalities: 
FIGS. 4, 5A-5F, 6A-6F, 7, 8A-8C, 9A-9C, 10, and 11 contain illegible text and illegible labels.  
FIG. 12A is objected to because it contains an embedded hyperlink and/or other form of browser-executable code. Applicant is required to delete the embedded hyperlink and/or other form of browser-executable code; references to websites should be limited to the top-level domain name without any prefix such as http:// or other browser-executable code. See MPEP § 608.01.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Objections
Claims 1 and 11 are objected to because of the following informalities:  
The limitation “…and second set of utterances” in line 8 of claim 1 should read “…and the second set of utterances”.  
The limitation “…and second set of utterances” in line 11 of claim 11 should read “…and the second set of utterances”.  
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1, 5, 8, 11, 15 and 18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 1 and 11 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being incomplete for omitting essential steps, such omission amounting to a gap between the steps.  See MPEP § 2172.01.  The omitted steps are: “wherein the first conversation comprises a first conversation participant and a second conversation participant.” Applicant recites a first conversation, a first conversation participant, and a second conversation participant in claims 1 and 11. However, it is unclear whether the set of utterances from the first conversation participant and the set of utterances from the second conversation participant are identified from the first conversation, or if they are identified from a separate conversation.
Claims 1 and 11 are further rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being unclear. Described with reference to claim 1, applicant recites "an utterance" in lines 9-10 of claim 1. However, applicant has already provided the limitations of "a first set of utterances" and "a second set of utterances," which establishes antecedent basis for “an utterance.” As such, it is unclear whether an utterance is part of "a first set of utterances," "a second set of utterances," or entirely separate from said sets of utterances.
Claim 5 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being unclear. Applicant recites "an utterance" in lines 2-3 of claim 5. However, applicant has previous recited “an utterance” at lines 9-10 of independent claim 1. As such, it is unclear whether "an utterance," as recited in claim 5, is the same element as "an utterance" in claim 1. Further, as "an utterance" is listed twice within claim 5, it is further unclear whether applicant intends "an utterance" at line 2 to be the same element as "an utterance" in lines 2-3.
Claim 15, as dependent from claim 11, is substantially the same as claim 5, as dependent from claim 1, and is therefore rejected under the same rationale as above.
Claim 8 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being unclear. Applicant recites "an utterance" in lines 3-4 of claim 8. However, applicant has previous recited “an utterance” at lines 9-10 of independent claim 1. As such, it is unclear whether "an utterance," as recited in claim 8, is the same element as "an utterance" in claim 1. Further, Applicant recites "an intent" in line 3 of claim 8. However, applicant has previous recited “one or more intents” at lines 2-3 of claim 8. As such, it is unclear whether "an intent" is related to "one or more intents" as recited in claim 8.
Claim 18 is substantially the same as claim 8 and is therefore rejected under the same rationale as above.
Appropriate correction is required.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-2, 8, 11-12, and 18 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Zimmerman (U.S. Pat. App. Pub. No. 2019/0260875, hereinafter Zimmerman).

Regarding claim 1, Zimmerman discloses A method comprising (the systems and methods described with reference to the "emotion monitor client program 75"; Zimmerman, ¶¶ [0025]): receiving, using one or more processors, a first conversation (the "emotion monitor client program 75" including the "simplified visual representation 72" as implemented on the workstation 60 including one or more CPUs 61 {...using one or more processors...} can "receives one or more inputs from a live voice call 402, which may include natural language content from... two or more people engaged in a [voice] conversation {receiving... a first conversation}"; Zimmerman, ¶¶ [0025], [0034]); identifying, using the one or more processors, a first set of utterances associated with a first conversation participant and a second set of utterances associated with a second conversation participant ("the input(s) from a live voice call 402 may be processed by a speaker recognition detectors 406 which any suitable technique for differentiating between the agent and customer {identifying, using the one or more processors,...a first conversation participant...and...a second conversation participant}" such as by using "a speaker adaptive automatic speech recognition (ASR) approach {a first set of utterances associated with... and a second set of utterances associated with...},"; Zimmerman, ¶¶ [0025], [0034]); and generating, using the one or more processors, a first image representation of the first conversation (the "emotion monitor client program 75" including the "simplified visual representation 72" as implemented on the workstation 60 including one or more CPUs 61 {...using one or more processors...}, described with reference to the example shown in FIG. 7, displays the timeline 700, where the timeline 700 is an image representation of the first conversation.; Zimmerman, ¶¶ [0025], [0047]), the first image representation of the first conversation visually representing the first set of utterances and second set of utterances, ("the timeline 700 shows the estimated or measured emotional states of the agent 701 (in the top trace) and the customer 702 (in the middle trace) {the first image representation of the first conversation visually representing...} which are generated by separately converting the verbal conversation of the agent and customer to text, maintaining the separation of the two speakers," thus each bar attributed to the agent 701 [the first conversation participant]represents the first set of utterances, and each bar attributed to the customer 702 [the second conversation participant] represents the second set of utterances.; Zimmerman, ¶¶ [0047], FIG. 7) wherein an utterance is visually represented by a first parameter associated with timing of the utterance ("FIG. 7 which shows a time line 700 which visually depicts the emotional state of a customer dialog over time (e.g., a 17 minute interval) {wherein an utterance is visually represented by a first parameter...}," where both the time linear spacing of the bar and numerical minute indicators are associated with timing of the utterance.; Zimmerman, ¶¶ [0047], FIG. 7), a second parameter associated with a number of tokens in the utterance ("For each conversation turn, the text is analyzed for instantaneous emotional content which is then converted for display as a single color {a second parameter...}" and "a dimensional representation of emotion can be estimated by averaging valence, arousal and dominance values (the PAD model) of the emotional words {associated with… tokens in the utterance}," where an average is a function of overall value and the number of values presented, thus average implicitly discloses that the dimensional representation of emotion is associated with a number of emotional words {tokens} in the utterance.; Zimmerman, ¶¶ [0047], FIG. 7, [0035]), and a third parameter associated with which conversation participant was a source of the utterance (As depicted in the example of FIG. 7, the row position {and a third parameter…} depicts which of the agent 701 or customer 702 was the source of the utterance {...associated with which conversation participant was a source of the utterance}.; Zimmerman, ¶¶ [0047], FIG. 7).

Regarding claim 2, Zimmerman discloses wherein the first image representation of the first conversation is a bar chart (The time lines {first image representation} depicted in FIGS. 7 and 8.; Zimmerman, ¶¶ [0047]-[0048], FIG. 7 and 8), the bar chart including a set of bars, each bar in the set of bars associated with an utterance from one of the first set of utterances and the second set of utterances (the timeline 700 and 800 show "the estimated or measured emotional states" of the agent 701 and 801 (in the top trace) and the customer 702 and 802 (in the middle trace) {the bar chart including a set of bars...} "which are generated by separately converting the verbal conversation of the agent and customer to text, maintaining the separation of the two speakers." {...each bar in the set of bars associated with an utterance from one of the first set of utterances and the second set of utterances}; Zimmerman, ¶¶ [0047]-[0048], FIG. 7 and 8), a location and first dimension of a first bar along a first axis serving as the first parameter and visually representing a timing of a first utterance represented by the first bar (the timeline 700 and 800 depict the position of each bar for the agent 701 and 801 (in the top trace) and the customer 702 and 802 (in the middle trace) is with respect to timing of the utterance in the 17 minute window. Further, position of the bar within the 17 minute window is a visual representation of the timing of the first utterance which is represented by each respective bar {including the first bar}; Zimmerman, ¶¶ [0047]-[0048], FIG. 7 and 8), a second dimension of the first bar along a second axis serving as the second parameter and visually representing a number of consecutive tokens in the first utterance represented by the first bar (The second dimension of the first bar is emotion where color visually represents the "two emotion score vectors," thus color corresponds to a numerical value and is a measure of depth {a second dimension of the first bar along a second axis serving as the second parameter}. Further, the instantaneous emotion value is based, at least in part, on the number of emotionally charged words in the utterance and therefore represents a number of consecutive tokens in the first utterance.; Zimmerman, ¶¶ [0047]-[0048], FIG. 7 and 8), and whether the first bar extends in a first direction or second direction from the first axis serving as the third parameter and visually representing whether the first utterance was that of the first conversation participant or the second conversation participant (As visually depicted in FIGS. 7 and 8, there is a central line {the first axis) positioned between the agent’s utterance bars (in the upper trace) and the customer’s emotion bars (in the middle trace) and which provides a measure of time along the length of said central line. If the utterance bar extends in a first direction above the central line, the utterance is from the agent. If the utterance bar extends in a second direction below the central line, the utterance is from the customer.; Zimmerman, ¶¶ [0047]-[0048], FIG. 7 and 8).

Regarding claim 8, Zimmerman discloses further comprising: identifying, from the first image representation of the first conversation, one or more intents within the first conversation, ("the agent call state indicator 414 {...from the first image representation of the first conversation} may be used {identifying...} to provide the agent conversational suggestions to help the agent guide the conversation towards a desired outcome," where conversational suggestions represent one or more intents within the first conversation.; Zimmerman, ¶¶ [0037]) wherein an intent is associated with an utterance that satisfies a threshold (The conversational suggestion {the intent} is associated with the emotional state, represented by the agent call state indicator, which is associated with an utterance which satisfies a threshold [the emotional content of an utterance "...as established by a threshold".; Zimmerman, ¶¶ [0037]), the threshold associated with an average number of tokens per utterance (the emotional state is determined by based on the emotional "content… [of the] utterance, as established by a threshold {the threshold associated...}," based on "detect[ing] emotional words {associated with an average number of tokens...} in the text {...per utterance} that have associated emotion indicating metrics"; Zimmerman, ¶¶ [0037], [0047]).

Regarding claim 11, Zimmerman discloses A system comprising (the systems and methods described with reference to the "emotion monitor client program 75"; Zimmerman, ¶¶ [0025]): : one or more processors; and a memory storing instructions that, when executed by the one or more processors, cause the system to ( ): receive a first conversation (the "emotion monitor client program 75" including the "simplified visual representation 72" as implemented on the workstation 60 including one or more CPUs 61 {...using one or more processors...} can "receives one or more inputs from a live voice call 402, which may include natural language content from... two or more people engaged in a [voice] conversation {receiving... a first conversation}"; Zimmerman, ¶¶ [0025], [0034]); identify a first set of utterances associated with a first conversation participant and a second set of utterances associated with a second conversation participant ("the input(s) from a live voice call 402 may be processed by a speaker recognition detectors 406 which any suitable technique for differentiating between the agent and customer {identifying, using the one or more processors,...a first conversation participant...and...a second conversation participant}" such as by using "a speaker adaptive automatic speech recognition (ASR) approach {a first set of utterances associated with... and a second set of utterances associated with...},"; Zimmerman, ¶¶ [0025], [0034]); and generate a first image representation of the first conversation (the "emotion monitor client program 75" including the "simplified visual representation 72" as implemented on the workstation 60 including one or more CPUs 61 {...using one or more processors...}, described with reference to the example shown in FIG. 7, displays the timeline 700, where the timeline 700 is an image representation of the first conversation.; Zimmerman, ¶¶ [0025], [0047]), the first image representation of the first conversation visually representing the first set of utterances and second set of utterances, ("the timeline 700 shows the estimated or measured emotional states of the agent 701 (in the top trace) and the customer 702 (in the middle trace) {the first image representation of the first conversation visually representing...} which are generated by separately converting the verbal conversation of the agent and customer to text, maintaining the separation of the two speakers," thus each bar attributed to the agent 701 [the first conversation participant]represents the first set of utterances, and each bar attributed to the customer 702 [the second conversation participant] represents the second set of utterances.; Zimmerman, ¶¶ [0047], FIG. 7) wherein an utterance is visually represented by a first parameter associated with timing of the utterance ("FIG. 7 which shows a time line 700 which visually depicts the emotional state of a customer dialog over time (e.g., a 17 minute interval) {wherein an utterance is visually represented by a first parameter...}," where both the time linear spacing of the bar and numerical minute indicators are associated with timing of the utterance.; Zimmerman, ¶¶ [0047], FIG. 7), a second parameter associated with a number of tokens in the utterance ("For each conversation turn, the text is analyzed for instantaneous emotional content which is then converted for display as a single color {a second parameter...}" and "a dimensional representation of emotion can be estimated by averaging valence, arousal and dominance values (the PAD model) of the emotional words {associated with… tokens in the utterance}," where an average is a function of overall value and the number of values presented, thus average implicitly discloses that the dimensional representation of emotion is associated with a number of emotional words {tokens} in the utterance.; Zimmerman, ¶¶ [0047], FIG. 7, [0035]), and a third parameter associated with which conversation participant was a source of the utterance (As depicted in the example of FIG. 7, the row position {and a third parameter…} depicts which of the agent 701 or customer 702 was the source of the utterance {...associated with which conversation participant was a source of the utterance}.; Zimmerman, ¶¶ [0047], FIG. 7).

Regarding claim 12, the rejection of claim 11 is incorporated. Claim 12 is substantially the same as claim 2 and is therefore rejected under the same rationale as above.

Regarding claim 18, the rejection of claim 11 is incorporated. Claim 18 is substantially the same as claim 8 and is therefore rejected under the same rationale as above.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 3 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zimmerman in view of Deshmukh (U.S. Pat. App. Pub. No. 2011/0196677, hereinafter Deshmukh) and Dwyer (U.S. Pat. App. Pub. No. 2015/0195406, hereinafter Dwyer ).

Regarding claim 3, Zimmerman discloses further comprising: analyzing the first image representation of the first conversation ("the timeline 700 may also display the instantaneous emotional interaction values 703 (in the bottom trace) which may be generated to show the interaction of the agent and customer for the conversation" which is analysis of the first image representation comprising the emotional state of the customer and the agent, as depicted in the first image representation of the first conversation.; Zimmerman, ¶¶ [0047], FIG. 7); identifying, from the first image representation of the first conversation, a… [silence] (the system compares both voiced and non-voiced portions of the conversation to determine the interaction values, thus implicitly disclosing analysis of emotional state at both voiced and non-voiced regions. Further, the system identifies portions of conversation which contain no voice from either party, thus identifying silence.; Zimmerman, ¶¶ [0047], FIG. 7). However, Zimmerman fails to expressly recite identifying... [silence as] a hold; and categorizing the first conversation into a first category based on the identification of the hold.
Deshmukh teaches systems and methods for analysis of change in emotion in an audio interaction. (Deshmukh, ¶ [0002]). Regarding claim 3, Deshmukh teaches identifying... [silence as] a hold ("Call aspects… are located in audio interaction" where "Hold time may be located using speech-silence analysis."; Deshmukh, ¶¶ [0047]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the emotional content monitoring system of Zimmerman to incorporate the teachings of Deshmukh to include identifying... [silence as] a hold. Determining the temporal trajectory of emotion in an audio interaction, as well as the causes of said trajectory, can “improve the negative-to-positive sentiment conversion rate of conversations” leading to an overall positive customer experience when interacting with an agent, as recognized by Deshmukh. (Deshmukh, ¶ [0036]). However, Zimmerman and Deshmukh fail to expressly recite categorizing the first conversation into a first category based on the identification of the hold.
Dwyer teaches “real-time automated monitoring systems for monitoring and improving live communications.” (Dwyer, ¶ [0003]). Regarding claim 3, Dwyer teaches and categorizing the first conversation into a first category ("Output of text conversion/ sentiment/acoustics analysis of customer conversations and associated metadata from any source communication system (call recorders, chat systems, emails, social posts (e.g. twitter, Facebook), SMS, Blogs and Surveys, etc.) and across multiple contact center sites and locations are stored in a database for... categorization {categorizing the first conversation}," where the process of categorization implicitly discloses at least a first category.; Dwyer, ¶¶ [0083]) based on the identification of the hold ("metadata" for the vocal communications {first conversation} can include "call length, silence time or percent (e.g., largest block of silence in seconds), silence between words (e.g. implied distance between communication and a response)... and the like."; Dwyer, ¶¶ [0085]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the emotional content monitoring system of Zimmerman, as modified by the emotional change analysis system of Deshmukh to incorporate the teachings of Dwyer to include categorizing the first conversation into a first category based on the identification of the hold. Providing real-time feedback during live communication can help an agent with “’first-call’ resolution” of a customer problem, resulting in an improved overall customer experience while improving agent efficiency, as recognized by Dwyer. (Dwyer, ¶ [0084]).

Regarding claim 13, the rejection of claim 11 is incorporated. Claim 13 is substantially the same as claim 3 and is therefore rejected under the same rationale as above.

Claim 4-7 and 14-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zimmerman in view of Dwyer.

Regarding claim 4, Zimmerman discloses further comprising: analyzing the first image representation of the first conversation ("the timeline 700 may also display the instantaneous emotional interaction values 703 (in the bottom trace) which may be generated to show the interaction of the agent and customer for the conversation" which is analysis of the first image representation comprising the emotional state of the customer and the agent, as depicted in the first image representation of the first conversation.; Zimmerman, ¶¶ [0047], FIG. 7); identifying, from the first image representation of the first conversation, a negative indicator (the system compares both voiced and non-voiced portions of the conversation to determine the interaction values, where "in selected embodiments, each interaction value represents the comparison of the anger of the customer to the response anger of the agent {a negative indicator}"; Zimmerman, ¶¶ [0047], FIG. 7). However, Zimmerman fails to expressly recite categorizing the first conversation into a first category based on the identification of the negative indicator.
The relevance of Dwyer is described above with relation to claim 3. Regarding claim 4, Dwyer teaches and categorizing the first conversation into a first category ("In an embodiment, a common taxonomy of category groups may be used when tagging calls and text communications {categorizing the first conversation into a first category}"; Dwyer, ¶¶ [0099]) based on the identification of the negative indicator ("One category may be behaviors, for example, how agents or customers are behaving... [where] Various language patterns, keywords, phrases, or other characteristics associated with the overall feel of ‘dissatisfaction’ may be included in a list for the category. When the listed item appears in the communication {...identification of the negative indicator...}, the ‘dissatisfaction’ tag may be applied {categorizing...based on the identification of the negative indicator}."; Dwyer, ¶¶ [0099]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the emotional content monitoring system of Zimmerman, to incorporate the teachings of Dwyer to include and categorizing the first conversation into a first category based on the identification of the negative indicator. Providing real-time feedback during live communication can help an agent with “’first-call’ resolution” of a customer problem, resulting in an improved overall customer experience while improving agent efficiency, as recognized by Dwyer. (Dwyer, ¶ [0084]).

Regarding claim 5, the rejection of claim 4 is incorporated. Zimmerman further discloses wherein the negative indicator is based on a ratio between a duration of an utterance and a number of tokens in the utterance, ("an emotional filter may be applied to capture latent emotions and see how the latent emotions change over time {a ratio} (e.g., to see if the latent anger of the customer subsides over the duration of the call {...between the duration of an utterance...} or if the agent exacerbates the customer’s anger)" where anger is measured based on the number of emotional words in the utterance.; Zimmerman, ¶¶ [0048]) wherein an utterance comprises a sequence of consecutive tokens (an utterance necessarily comprises words {a sequence of consecutive tokens} as exemplified in FIG. 8.; Zimmerman, ¶¶ [0048], FIG. 8).

Regarding claim 6, the rejection of claim 4 is incorporated. Zimmerman further discloses wherein the first image representation of the first conversation is generated contemporaneously with the first conversation (the system provides "substantially real-time visual feedback {the first image representation of the first conversation} of the emotional state of the communication {is generated contemporaneously with the first conversation}"; Zimmerman, ¶¶ [0050]), and subsequent to identifying the negative indicator, the first conversation is identified for intervention ("Using the... representation of the emotional state of the input text, the visualization analytics tool 412 generates a call state indicator for the agent 414 and/or for the supervisor 416. " where "As a result of the supervisor call state indicator 416 indicating that the emotional stress of the customer-agent conversation has reached a threshold, the supervisor may be automatically connected with the distressed call to intervene in a conversation that has reached an undesired state."; Zimmerman, ¶¶ [0037]-[0038]).

Regarding claim 7, Zimmerman discloses further comprising: analyzing the first image representation of the first conversation ("the timeline 700 may also display the instantaneous emotional interaction values 703 (in the bottom trace) which may be generated to show the interaction of the agent and customer for the conversation" which is analysis of the first image representation comprising the emotional state of the customer and the agent, as depicted in the first image representation of the first conversation.; Zimmerman, ¶¶ [0047], FIG. 7). However, Zimmerman fails to expressly recite filtering one or more of the first conversation and an utterance within the first conversation, wherein filtering the first conversation includes adding the first conversation to a category based on detecting one or more of a conversational phase and a conversational affect in the first conversation and wherein filtering the utterance within the first conversation includes one or more of identifying one or more of a negative indicator, active listening, pleasantries, information verification, and user intent.
The relevance of Dwyer is described above with relation to claim 3. Regarding claim 7, Dwyer teaches and filtering one or more of the first conversation and an utterance within the first conversation, ("acoustically analyzed communications may be included in a sortable database {filtering one or more of the first conversation and an utterance within the first conversation}"; Dwyer, ¶¶ [0086]) wherein filtering the first conversation includes adding the first conversation to a category ("In an embodiment, a common taxonomy of category groups may be used when tagging calls and text communications {categorizing the first conversation into a first category}"; Dwyer, ¶¶ [0099]) based on detecting one or more of a conversational phase and a conversational affect in the first conversation ("One category may be behaviors, for example, how agents or customers are behaving... [where] Various language patterns, keywords, phrases, or other characteristics associated with the overall feel of ‘dissatisfaction’ may be included in a list for the category. When the listed item appears in the communication {...detecting one or more of a conversational phase and a conversational affect...}, the ‘dissatisfaction’ tag may be applied {based on detecting... in the first conversation}."; Dwyer, ¶¶ [0099]), and wherein filtering the utterance within the first conversation includes one or more of identifying one or more of a negative indicator, active listening, pleasantries, information verification, and user intent ("Categorization" as applied to the searchable database, can include "sophisticated language patterns... [and] language patterning is capable of measuring language in specific locations of conversation, order of occurrence of language, positive or negative rules {...a negative indicator}, standard Boolean logic, and the like {...identifying one or more of a negative indicator, active listening, pleasantries, information verification, and user intent}." and where "the database may be searchable by category {filtering the utterance within the first conversation...}"; Dwyer, ¶¶ [0101]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the emotional content monitoring system of Zimmerman, to incorporate the teachings of Dwyer to include filtering one or more of the first conversation and an utterance within the first conversation, wherein filtering the first conversation includes adding the first conversation to a category based on detecting one or more of a conversational phase and a conversational affect in the first conversation and wherein filtering the utterance within the first conversation includes one or more of identifying one or more of a negative indicator, active listening, pleasantries, information verification, and user intent. Providing real-time feedback during live communication can help an agent with “’first-call’ resolution” of a customer problem, resulting in an improved overall customer experience while improving agent efficiency, as recognized by Dwyer. (Dwyer, ¶ [0084]).

Regarding claim 14, the rejection of claim 11 is incorporated. Claim 14 is substantially the same as claim 4 and is therefore rejected under the same rationale as above.

Regarding claim 15, the rejection of claim 14 is incorporated. Claim 15 is substantially the same as claim 5 and is therefore rejected under the same rationale as above.

Regarding claim 16, the rejection of claim 14 is incorporated. Claim 16 is substantially the same as claim 6 and is therefore rejected under the same rationale as above.

Regarding claim 17, the rejection of claim 11 is incorporated. Claim 17 is substantially the same as claim 7 and is therefore rejected under the same rationale as above.

Claim 9-10 and 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zimmerman in view of Salammagari (U.S. Pat. App. Pub. No. 2020/0082214, hereinafter Salammagari) and Mazza (U.S. Pat. App. Pub. No. 2019/0182382, hereinafter Mazza ).

Regarding claim 9, the rejection of claim 8 is incorporated. Zimmerman disclose all of the elements of the current invention as stated above. However, Zimmerman fail(s) to expressly recite further comprising: receiving the one or more intents identified within the first conversation and one or more intents identified in one or more other conversations; clustering the one or more intents identified within the first conversation and the one or more intents identified in one or more other conversations to generate a set of clusters associated with unique intents; generating a conversation map visually representing a first cluster associated with a first unique intent as a first node, a second cluster associated with a second unique intent as a second node, and visually representing a transition between the first unique intent to the second unique intent as edges; and identifying, from the conversation map, a preferred path; and performing self-supervised learning based on the preferred path.
Salammagari teaches “method and apparatus for facilitating training of agents for interacting with customers.” (Salammagari, ¶ [0002]). Regarding claim 9, Salammagari teaches further comprising: receiving the one or more intents identified within the first conversation and one or more intents identified in one or more other conversations (FIG. 6 references the "the plurality of interactions between agents" which are "explained with reference from FIGS. 2 to 5" where "transcripts" of conversations between agents and customers are used "for intent discovery and subsequent clustering of interactions based on discovered (i.e. derived) intents."; Salammagari, ¶¶ [0074], [0059]); clustering the one or more intents identified within the first conversation and the one or more intents identified in one or more other conversations to generate a set of clusters associated with unique intents ("As explained with reference from FIGS. 2 to 5, the plurality of interactions between agents and the customers of the enterprise are classified into a plurality of intent-based interaction clusters."; Salammagari, ¶¶ [0074]); generating a conversation map visually representing a first cluster associated with a first unique intent as a first node (the system then "generate an interaction flow map for each intent-based interaction cluster {a conversation map visually representing a first cluster…} using the interactions classified into the respective intent-based interaction cluster {...associated with a first unique intent...}" where the visual representation in the map of each intent based interaction cluster is the respective node {...as a first node}.; Salammagari, ¶¶ [0074]), a second cluster associated with a second unique intent as a second node (the system then "generate an interaction flow map for each intent-based interaction cluster" where "each..." implicitly includes a plurality of intent-based interaction clusters {a conversation map visually representing... a second cluster} "using the interactions classified into the respective intent-based interaction cluster {...associated with a second unique intent...}" where the visual representation in the map of each intent based interaction cluster is the respective node {...as a second node}.; Salammagari, ¶¶ [0074]), and visually representing a transition between the first unique intent to the second unique intent as edges ("For each interaction, an interaction path is traced {visually representing a transition between…} from one interaction turn to another {…the first unique intent to the second unique intent}."; Salammagari, ¶¶ [0078]); and identifying, from the conversation map, a… path ("The flow of interaction from one interaction turn to another is then traced using interaction paths… to configure the interaction flow map,"; Salammagari, ¶¶ [0080]); and performing self-supervised learning based on the… path ("solution designers may use the interaction flow maps as a reference map to train machine learning algorithms to function as automated conversational agents or chat bots" which may be "automatically generated... to train machine learning models {...self-supervised learning based on the preferred path}; Salammagari, ¶¶ [0083], [0029]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the emotional content monitoring system of Zimmerman, to incorporate the teachings of Salammagari to include further comprising: receiving the one or more intents identified within the first conversation and one or more intents identified in one or more other conversations; clustering the one or more intents identified within the first conversation and the one or more intents identified in one or more other conversations to generate a set of clusters associated with unique intents; generating a conversation map visually representing a first cluster associated with a first unique intent as a first node, a second cluster associated with a second unique intent as a second node, and visually representing a transition between the first unique intent to the second unique intent as edges; and identifying, from the conversation map, a preferred path; and performing self-supervised learning based on the preferred path. The provision of different interaction flows, covering a variety of situations can help agents to “provide effective assistance to the customers and improve a quality of customer interaction experience,” as recognized by Salammagari. (Salammagari, ¶ [0028]). However, Zimmerman and Salammagari fail to expressly recite identifying, from the conversation map, a preferred path.
Mazza teaches “systems and methods for training and operating chatbots.” (Mazza, ¶ [0001]). Regarding claim 9, Mazza teaches identifying, from the conversation map, a preferred path ("the pruning is performed automatically by keeping the edge that corresponds to the sequences that occur most frequently in the sample dialogue data," and where pruning to "keep the edge that corresponds…" is identifying a preference for a path.; Mazza, ¶¶ [0147]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the emotional content monitoring system of Zimmerman, as modified by the agent training systems of Salammagari, to incorporate the teachings of Mazza to include identifying, from the conversation map, a preferred path. Automatically generating chatbots based on sample dialogue data, including “deducing the space of topics discussed in the contact center… [and] discovering the various dialogue flows in each of the topics “can “improve the topical coverage of a resulting configured chatbot compared to manually configured chatbots,” as recognized by Mazza. (Mazza, ¶ [0048], [0051]).

Regarding claim 10, the rejection of claim 9 is incorporated. Zimmerman, Salammagari, and Mazza disclose all of the elements of the current invention as stated above. However, Zimmerman and Salammagari fail(s) to expressly recite wherein the preferred path is one of a shortest path and a densest path.
The relevance of Mazza is described above with relation to claim 9. Regarding claim 10, Mazza teaches wherein the preferred path is one of a shortest path and a densest path ("the pruning is performed automatically by keeping the edge that corresponds to the sequences that occur most frequently in the sample dialogue data," and where pruning to "keep the edge that corresponds…" is identifying a preference for a path and “sequences that occur most frequently” is the densest path.; Mazza, ¶¶ [0147]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the emotional content monitoring system of Zimmerman, as modified by the agent training systems of Salammagari, to further incorporate the teachings of Mazza to include wherein the preferred path is one of a shortest path and a densest path. Automatically generating chatbots based on sample dialogue data, including “deducing the space of topics discussed in the contact center… [and] discovering the various dialogue flows in each of the topics “can “improve the topical coverage of a resulting configured chatbot compared to manually configured chatbots,” as recognized by Mazza. (Mazza, ¶ [0048], [0051]).

Regarding claim 19, the rejection of claim 18 is incorporated. Claim 19 is substantially the same as claim 9 and is therefore rejected under the same rationale as above.

Regarding claim 20, the rejection of claim 19 is incorporated. Claim 20 is substantially the same as claim 10 and is therefore rejected under the same rationale as above.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Beaumont et al. (U.S. Pat. App. Pub. No. 2020/0004878) discloses systems and methods for generating dialogue graphs for the creation of virtual agents or virtual assistants.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Sean E Serraguard/Patent Examiner, Art Unit 2657  

/LAMONT M SPOONER/Primary Examiner, Art Unit 2657                                                                                                                                                                                                                                                            
9/30/2022