DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 to 4, 9 to 13, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Indyk et al. (U.S. Patent No. 10,192,569) in view of Pratt et al. (U.S. Patent Publication 2014/0152816).
Concerning independent claims 1 and 10, Indyk et al. discloses a computer-implemented method and computer-readable medium for informing a support agent of a paralinguistic emotion signature of a user, comprising:
“receiving a call from a customer at a call center” – assisted support provides users with an option to interact with ‘live’ customer support agents (column 1, lines 57 to 61); a support agent can interact with a user (“a customer”) for different types of support encounters including a voice telephone call (column 7, lines 34 to 37: Figure 1); implicitly, an agent interacting with a customer occurs in a context of “a call center”; 
e.g., 10-30 ms frames) (“a sample”); a computing device captures the audio stream from client systems 110, 120 (column 6, lines 25 to 30: Figure 1); client systems 110, 120, or agent device 140 may receive the audio stream of the speech of the user via a microphone (column 8, lines 63 to 67: Figure 1); here, frames of 10-30 ms of an audio stream can be construed to represent “a sample”; broadly, “recording a sample” of the call is equivalent to ‘capturing’ for voice/speech of a telephone call of a user at a microphone;  
“extracting the Mel-Frequency Cepstral Coefficient (MFCC) from the sample” – embodiments process the audio stream and extract paralinguistic information from the audio stream; paralinguistic information includes low level descriptors of linear prediction cepstral coefficients (LPCC), cepstral coefficients (Mel frequency cepstral coefficients (MFCCs) (“extracting the Mel-Frequency Cepstral Coefficient (MFCC) from the sample”), and spectrum (Mel frequency bands (MFB)) (column 3, line 61 to column 4, line 22);
“using a machine learning model to predict an emotion of the customer using the MFCC” – once extracted, an online service can process the paralinguistic information of the voice/speech to determine one or more attribute measures of the user, e.g., emotional state; one or more emotional states of the user include angry, afraid, positive, negative, joyful, happy, bored, alert, tranquil, excited, warm, aggressive, friendly, gloomy, etc. (column 4, lines 23 to 35); online software uses paralinguistic features in etc. (column 10, lines 48 to 60: Figure 1); speech component 160 can use paralinguistic information from audio 204 and data analysis tool 208 to predict a user’s emotional state as an emotion signature 210 (column 11, lines 14 to 43: Figure 2); speech component 160 can use neural networks and deep feedforward and recurrent neural networks on the audio stream to extract words spoken by the user (column 11, lines 52 to 47: Figure 1); 
“generating a confidence score for the emotion” – low level attributes are matched to one or more emotional state of the user; predicted attributes can be determined with a probability that they are correct; an online service may assign the determined attribute to the user, or if the probability is below a threshold, not assign the determined attribute to the user (column 4, lines 30 to 41); speech component 160 may determine each emotional state as a score with a value ranging from -1 to 1, where 1 is associated with a high positive emotional state and -1 is associated with a negative emotional state; speech component 160 can use any suitable range of scores, e.g., 0 to 10, or 1 to 100 (column 11, lines 43 to 51: Figure 2);
etc. (column 4, line 61 to column 5, line 10); embodiments can recommend actions (or activities) for the support agent to employ when interacting with the user based on the user sentiment; if the paralinguistic information indicates that the user is frustrated with a particular workflow (“a script”) included in the online service, the online service can recommend that the support agent help the user with that workflow; the online service can recommend that the support agent interact with the user in a certain manner, e.g., by saying particular words or phrases, using a certain tone, using a certain volume, refraining from saying particular phrases, and ask if the user needs help with other features of the application based on paralinguistic information; the online service may use an analytical model to predict the set of recommended actions that increase the likelihood of achieving a positive output for the support agent (column 8, line 50 to column 9, line 5); broadly, recommending that an agent interact with the user in a certain manner by saying particular words or phrases and to ask if the user needs help with a particular application is “modifying a script”; that is, a ‘script’ represents particular words in a workflow.
Indyk et al. discloses all of the limitations of these independent claims with an arguable exception of a script and “wherein the script is based on a standard operating procedure (SOP) and modifying the script comprises modifying the SOP”.  Generally, Indyk et al.’s user is a customer of a support agent for an interaction that can be a voice call, and a scenario of a customer and an agent implicitly takes place in “a call center”.  Indyk et al. discloses “a confidence score for an emotion”, which can be a probability that an emotion is correctly determined or a score between -1 and 1, 0 to 10, or 1 to 100.  Broadly, Indyk et al. discloses “modifying a script of a call center agent based on the emotion and the confidence score” because recommended actions for a support agent to take are based on paralinguistic information of a user’s emotional state.  (Column 5, Line 50 to Column 6, Line 5)  Moreover, “standard operating procedure” is defined as ‘established prescribed methods to be followed routinely for the performance of designated operations or in designated situations’ or ‘a set of step-by-step instructions compiled by an organization to help workers carry out routine operations’.  See https://www.merriam-webster.com/dictionary/standard%20operating%20procedure and https://en.wikipedia.org/wiki/Standard_operating_procedure.  A ‘standard operating procedure’, then, is just a conventional procedure that is commonly used, and “modifying a script . . . based on a standard operating procedure (SOP)” is just a modification of a conventional procedure given by a script.  Still, Indyk et al. can be construed to inherently provide “a script” with “a standard operating procedure (SOP)” as an interactive workflow through which an application guides a user, e.g., tax preparation services.  (Column 1, Lines 14 to 21; Column 1, Lines 57 to 61; Column 3, Indyk et al.’s workflow, then, corresponds to a “script” providing “a standard operating procedure (SOP)”, and that workflow is modified based on paralinguistic information including emotional state of a user.
Concerning independent claims 1 and 10, even if this limitation of “a script . . . based on a standard operating procedure (SOP) and modifying the script comprises modifying the SOP” is omitted by Indyk et al., it is taught by Pratt et al.  Generally, Pratt et al. teaches an operating management system that is configured to execute workflow instructions, and convert the workflow instructions from a text command into an audible command.  (Abstract)  Workflow software prompts operators and guides them through standard operating procedures (SOPs).  (¶[0005])  Embodiments integrate voice and speech recognition software into a workflow interface, e.g., a shop-floor manufacturing process.  (¶[0022])  Workflow software 28 may comprises instructions for automation of a process 50, e.g., movement of product from one operator to another for a certain action, according to a set of rules or SOP’s.  Workflow software digitizes a flowchart for automated tasks including checking tank levels, managing a production process, orchestrating data transformations, and installing components of a complex machine.  The manufacturing software 28 electronically guides operators through step-by-step instructions using text-to-speech (TTS) 32 software.  (¶[0031] - ¶[0032]: Figure 1)  A system is configured to verify a user’s identity by prompting an operator to speak a certain lexicon, which is read and matched by a processor to an operator’s stored voice.  Pratt et al. expressly teaches “a standard operating procedure (SOP)” of workflow instructions, and workflow instructions are equivalent to “a script”.  An objective is to provide increased efficiency and effectiveness of manufacturing processes involving a large number of steps and/or a complex manufactured product.  (¶[0014])  It would have been obvious to one having ordinary skill in the art to apply standard operating procedures of workflows in Pratt et al. to a modification of interactive workflows in Indyk et al. for a purpose of increasing efficiency and effectiveness of complex manufacturing processes and products.

Concerning claims 2 and 11, Indyk et al. discloses that low levels descriptors may be extracted from frames of the audio stream (e.g., 10-30 ms frames); embodiments process the audio stream and extract paralinguistic information from the etc. (column 3, line 61 to column 4, line 22).  Here, paralinguistic information of these parameters are what is conventionally known in the art as “features” and a 10-30 ms audio frame is “the sample” (“wherein extracting the MFCC from the sample further comprises extracting an audio feature from the sample”).
Concerning claims 3 and 12, Indyk et al. discloses that support component 162 may analyze the user’s sentiment to different application content and determine one or more actions and content for the support agent to employ based on the sentiment of the user; support component 162 can use one or more predictive models to determine a set of actions and/or information content that, if used by the support agent, increase a likelihood of the support agent achieving a positive output for the call; these predictive models can be trained using techniques that include machine learning (column 10, lines 48 to 60: Figure 1); agent reaction model 306 may be trained over historical data including historical emotion signature data of users, historical support agent actions in response to users contacting assisted support, and outcomes of the support calls; training agent reaction model 306 can be used to identify correlations, or mathematical and statistical relationships, between different actions of a support agent to different types of users expressing different emotional states (column 12, line 57 to column 13, line 11: Figure 3); here, training a model by machine learning with historical emotion signature data of users is “wherein the machine learning model was trained using an 
Concerning claims 4 and 13, Indyk et al. discloses that low levels descriptors may be extracted from frames of the audio stream (e.g., 10-30 ms frames); embodiments process the audio stream and extract paralinguistic information from the audio stream that include cepstral coefficients (Mel frequency cepstral coefficients (MFCCs) (“wherein extracting the MFCC sample comprises framing the sample into short frames”) (column 4, lines 2 to 22).
Concerning claims 9 and 18, Indyk et al. discloses an embodiment where speech component 160 may be configured to track sentiment of a user interacting with an interactive computing service 150 as the user is progressing through an interactive workflow to generate paralinguistic information for each interactive screen.  Speech component 160 can determine an array of emotional states for each interactive screen.  (Column 9, Lines 18 to 31: Figure 1)  Speech component 160 can record change in user sentiment for multiple screens or change in sentiment between two interactive screens.  Speech component 160 may determine that the user is initially in a positive mood from paralinguistic information of the user interacting with a first one or more screens, and then determine that the user’s mood has changed to neutral or negative from paralinguistic information of the user interacting with a second one or more screens.  (Column 9, Lines 46 to 59: Figure 1)  Here, each screen of multiple screens can be understood as representing one time step in “multiple time steps”, where an emotion is .

Claims 5 to 6 and 14 to 15 are rejected under 35 U.S.C. 103 as being unpatentable over Indyk et al. (U.S. Patent No. 10,192,569) in view of Pratt et al. (U.S. Patent Publication 2014/0152816) as applied to claims 1, 4, 10, and 13 above, and further in view of Howard (U.S. Patent Publication 2019/0074028).
Indyk et al. discloses a set of low level descriptors as paralinguistic information for determining a user sentiment, i.e., emotion, extracted from frames of an audio stream, where low level descriptors include MFCCs, spectrum, and MPEG-7 audio spectrum projection.  (Column 3, Line 66 to Column 4, Line 22)  However, Indyk et al. does not expressly disclose that these features include calculating a periodogram estimate of a power spectrum for each frame or generating DCT coefficients and using the DCT coefficients in a machine learning model.  Still, these features could be understood by one skilled in the art as being inherent for Indyk et al.  Generally, a periodogram is basically a graph of a power spectrum of a signal, and a spectrum is disclosed by Indyk et al.  Similarly, Indyk et al. discloses MFCCs and MPEG-7 audio, where MFCCs and MPEG-7 implicitly include calculations of discrete cosine transform Indyk et al., they are still taught by Howard.  
Generally, Howard teaches real-time vocal feature extraction for automated emotional state assessment.  (Abstract)  Determining which extracted signal features to use may comprise using a decision tree model or a neural network model.  Extracting signal features from an audio signal may comprise using Cepstral Coefficients, and forming a Fourier transform signal, performing power spectrum processing on the Fourier transformed signal, performing Mel-Cepstral filter bank processing on the power spectrum signal to form a logarithm signal, performing discrete cosine transform processing to form a discrete cosine transformed signal, and obtaining a plurality of Mel-Cepstral Coefficients.  (¶[0007])  Figure 20 is illustrative of a periodogram.  (¶[0030]: Figure 20)  Preprocessing and recording of line calls may be performed.  (¶[0053])  Feature extraction may include features of Mel Frequency Cepstral Coefficients to detect paralinguistic content in speech signal processing.  (¶[0094] - ¶[0095])  MFCC computation includes input to discrete Fourier transform (DFT) stage 908, which is input to power spectrum stage 910, filter bank stage 912, and log stage 916, which is input to discrete cosine transform stage 918, and the output is 12 MFCC data samples 924.  (¶[0099]: Figure 9)   TensorFlow performs machine learning for feature extraction and model implementation.  (¶[0142] - ¶[0144])  MFCCs are used to assess characteristics being a proper representation of human auditory perception, where a power spectrum is plotted as a periodogram in Figure 20.  (¶[0159]: Figure 20)  Howard, then, teaches whatever limitations might not be expressly disclosed by Indyk et al., as directed to a periodogram of a power spectrum, using DCT coefficients in calculating MFCCs, and a Howard to obtain paralinguistic descriptors of Indyk et al. for a purpose of determining an emotional state of a user.

Claims 7 to 8 and 16 to 17 are rejected under 35 U.S.C. 103 as being unpatentable over Indyk et al. (U.S. Patent No. 10,192,569) in view of Pratt et al. (U.S. Patent Publication 2014/0152816) as applied to claims 1 and 10 above, and further in view of Dezonno et al. (U.S. Patent Publication 2004/0062364).
Generally, Indyk et al. discloses recommending a workflow to a support agent to help a user with that workflow based on an emotional state of a user reflected in paralinguistic information, so that the service can recommend that the support agent interact with the user in a certain manner, e.g., by saying particular words or phrases, refraining from saying particular words or phrases, or asking if a user needs help with features of the application.  (Column 5, Lines 50 to 64)  Broadly, Indyk et al., then, provides “modifying a script of a call center agent”, but does not expressly disclose the limitations that modifying the script includes “deleting a portion of the script” or “substituting a portion of the script with a new script”.  Still, recommending that a support agent interact with the user by saying particular words could be construed as “substituting a portion of the script with a new script” and refraining from saying particular words can be construed as “deleting a portion of the script”.  
Dezonno et al. teaches whatever of these limitations might be omitted by Indyk et al.  Generally, Dezonno et al. teaches a method of selecting actions for an agent by analyzing a conversation content and emotional inflection that provides agents with scripts in a call center.  (Abstract)  A network receives a call from a caller, and provides a transaction input to scripting service 200, where the transaction input is voice signal 202.  Emotion detector 200 receives voice signal 202, and provides measurement of verbal attributes.  Emotion detector 208 outputs at least one tag indicator 210.  Text stream 206 and at least one tag indicator 210 are received by scripting engine 212, scripting engine determines a response or script to a caller, and selects a script file from a plurality of script files 214.  The selected script is then output as script 216, and this script 216 is sent to an agent and guides the agent in replying to the current caller.  If a caller is initially very upset, scripting engine 216 tailors the script file for output script 216 to appease the caller.  If the caller becomes less agitated as indicated by emotion detector 208, scripting engine 212 selects a different script file 214 and outputs it as script 216 to a respective agent.  Emotion detector 208 may output a tag indicator 210 with a value identifying an emotional state and optionally a state value, e.g., Aggravation Level=9.  Script calculation is performed to associate tag indicator 210 values with a selection of scripts.  Script 2 may be chosen as a next script after script 1 if tag indicator 210 values are less than 4, script 3 may be selected for tag indicator 210 values greater than 4 but less than 8, and script 4 may be selected for all other tag indicator values.  (¶[0019] - ¶[0022]: Figure 2)  Here, selecting script 3 instead of script 2 after script 1 when a tag indicator 210 value is greater than 4 but less than 8 is equivalent to “wherein modifying the script of the call center agent includes substituting i.e., “wherein modifying the script of the call center agent includes deleting a portion of the script.”  An objective is to provide agents with scripts that are based on an emotional state of the caller to decrease the cost of operating a call center and to increase the agent’s ability to interface with a caller.  (¶[0028])  It would have been obvious to one having ordinary skill in the art to receive a call from a customer at a call center to substitute or delete a portion of a script of an agent based on the emotion as taught by Dezonno et al. for a workflow of a support agent in Indyk et al. for a purpose of decreasing a cost of operating a call center and increasing an agent’s ability to interface with a caller. 

Response to Arguments
Applicants’ arguments filed 22 March 2021 are being considered but are moot in view of new grounds of rejection as necessitated by amendment.
Applicants’ amendments overcome the objections to the claims and to the Specification.
Applicants amend independent claims 1 and 10 to set forth a new limitation directed to “wherein the script is based on a standard operating procedure (SOP) and modifying the script comprises modifying the SOP”, and presents arguments traversing the prior rejection of these independent claims as being obvious under 35 U.S.C. §103 over Indyk et al. (U.S. Patent No. 10,192,569) and Dezonno et al. (U.S. Patent Publication 2004/0062364).  Generally, Applicants’ argument is simply that Indyk et al. Dezonno et al. do not disclose or suggest modifying a script where the script is based on “a standard operating procedure”.  Specifically, Applicants emphasize the limitations of “based on a standard operating procedure (SOP)” and “modifying the SOP” in their arguments.
Applicants’ amendments necessitate new grounds of rejection as directed to independent claims 1 and 10 being obvious under 35 U.S.C. §103 over Indyk et al. (U.S. Patent No. 10,192,569) in view of Pratt et al. (U.S. Patent Publication 2014/0152816).  The rejection of the independent claims substitutes Pratt et al. for Dezonno et al.  The rejection of some dependent claims continues to rely upon Dezonno et al. and Howard.  Applicants’ amendments are supported by the Specification.  Generally, Pratt et al. teaches workflow instructions that are expressly described as “standard operating procedures (SOPs)”.  Here, these workflow instructions are equivalent to “a script”, so that “wherein the script is based on a standard operating procedure (SOP)” is taught by Pratt et al.  Similarly, Indyk et al. discloses workflows provided by an application for customer support agents to guide a user.  (Column 1, Lines 18 to 21; Column 1, Lines 57 to 61; Column 5, Lines 54 to 64)  It is maintained that these workflows provided by an application for customer support agents are equivalent to ‘scripts’.  Indyk et al., then, discloses “modifying a script of a call center agent” when an online service recommends that the support agent interact with the user in a certain manner by saying particular words or phrases, refraining from saying particular words or phrases, and asking if a user needs help with other features of the application.  (Column 5, Lines 58 to 64)  Consequently, “modifying a script” as Indyk et al. “comprises modifying the SOP” because a script comprises standard operating procedures as taught by Pratt et al.   
Applicants’ arguments are moot in light of these new grounds of rejection.  All of these new grounds of rejection are necessitated by amendment.  Accordingly, this rejection is properly FINAL.

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicants’ disclosure.
Balsavias et al., Purushothaman, and Bricklin et al. disclose related prior art.
Applicants' amendment necessitated the new grounds of rejection presented in this Office Action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP §706.07(a).  Applicants are reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571) 272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair.  Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MARTIN LERNER/Primary Examiner
Art Unit 2657                                                                                                                                                                                                        October 15, 2020