DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's amendments with respect to Examiner’s Note of claims 9-13 have been considered. 
Applicant's arguments with respect to 35 U.S.C. 112(b) rejection of claims 13 and 6, 13 and 19  have been considered and found persuasive due to amendments, and the rejection has been withdrawn.
Applicant's arguments with respect to 35 U.S.C. 103 in regards to claims 1-20 have been considered but are moot due to new grounds of rejection necessitated by amendments. See detailed rejection below.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Haukioja et al. (US 2019/0253558) in view of Liu (US 2021/0051401) in view of Jain (CN 106503646).

Claim 1,
Haukioja teaches a method for voice emotion identification contained in audio in a call providing customer support between a customer and a service agent, the method comprising ([Fig. 1] [0028] identifying emotions in customer/agent calls): 
implementing an emotion identification application to identify emotions captured in a voice of the customer from audio received by a system ([Fig. 1] [0028-0029] the system (software application) identifying emotions in the customer’s voice using emotional pattern recognition); 
receiving, by the emotion identification application, an audio stream comprising voice samples from consecutive frames from the audio received ([Fig. 1] [0025-0027] the system performs speaker Diarization on the call audio signal in order to separate out the multiple voices or speakers heard on a single customer call; the time varying audio signal sample is divided into a series of frames); 
extracting, by the emotion identification application, a set of voice emotion features from each frame in each voice sample of the audio by applying a trained machine learning (ML) model, wherein the trained ML model identifies emotions utilizing a neural network to determine one or more voice emotions; and classifying, by the emotion identification application, each voice emotion determined by the trained ML model based on a set of classifying features to label one or more types of emotions captured in each voice sample ([Fig. 1] [0028-0029] [0032-0033] [0052] [0059] the system may sample the audio signal, isolate the customer's voice, and then overlay the description of the emotional state of the customer for each time slice of the audio sample. The customer's audio sample will preferably by given emotional labels describing the state of mind of the customer throughout the entire call; the agent's voice will initially be isolated from the call audio signal and sampled for emotional feature data; the agent's audio signal will be sampled, pre-processed and undergo frame division splicing; the sample frames will be analyzed for features; the system preferably uses pattern matching, grouping, clustering, and classification to identify and label the time sliced samples of the agent's voice; each sample slice is analyzed by the system to determine the agent's precise state of mind and emotional level; the system provides a truer snapshot of the agent/customer interaction and overall customer satisfaction by automatically generating emotional tags and labels throughout the customer call; the system utilizes a machine learning (ML) classifier Support Vector Machine algorithm and neural networks).
The difference between the prior art and the claimed invention is that Haukioja does not explicitly teach audio received by a media streaming device and wherein the set of voice emotion features comprises a spectral flatness, spectral centroid, spectral roll off, spectral bandwidth, zero-crossing rate, and root-mean- square energy.
Liu teaches audio received by a media streaming device ([0024] set top box for receiving speech signal emitted by a user).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Haukioja with teachings of Liu by modifying the system to automatically monitor service level agreement compliance in call centers as taught by Haukioja to include audio received by a media streaming device as taught by Liu for the benefit of processing speech (Liu [0011]).
The difference between the prior art and the claimed invention is that Haukioja nor Liu explicitly teach wherein the set of voice emotion features comprises a spectral flatness, spectral centroid, spectral roll off, spectral bandwidth, zero-crossing rate, and root-mean- square energy.
Jain teaches wherein the set of voice emotion features comprises a spectral flatness, spectral centroid, spectral roll off, spectral bandwidth, zero-crossing rate, and root-mean- square energy ([pg. 4] the feature comprises acoustic rhythm, pitch, intensity, tone, spectrum, cepstrum, perceptual linear prediction cepstrum coefficient, root-mean-square intensity, zero-crossing rate, spectrum, spectrum centroid, frequency band width, frequency spectrum, spectral flatness, spectral slope, spectrum roughenss sound chroma, spectral attenuation point, spectral slope, single-frequency sound, sound, voice formant, climbing point voice, spectral envelope).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Haukioja with teachings of Jain by modifying the extracting emotional feature step of the automatic monitor service system as taught by Haukioja to include wherein the set of voice emotion features comprises a spectral flatness, spectral centroid, spectral roll off, spectral bandwidth, zero-crossing rate, and root-mean- square energy as taught by Jain for the benefit accurately identify the emotion of the target object in a human-computer interaction process (Jain [pg. 2]).

Claim 2,
Haukioja further teaches the method of claim 1, further comprising: classifying, by the emotion identification application, at least one emotion type based on an intensity level of audio in a visual representation of the voice sample ([0048] real-time compliance data is also observable with the system by providing indicators across the call center agent team showing specific activity with respect to SLA metrics; the system can provide live streaming data displaying the agent activity and scoring levels; the call center performance under the SLA contract can be viewed contemporaneously with live sampling of agent/customer activity data).

Claim 3,
Haukioja further teaches the method of claim 1, wherein the one or more emotions comprise emotion types of angry, disgust, happy, fear, sad, surprise, and neutral emotions ([0051] emotional classification of the sample by labeling with: happy, sad, etc.).

Claim 4,
Haukioja further teaches the method of claim 1, further comprising: predicting, by the emotion identification application, the one or more emotions comprise one or more emotions based on a prior set of classifying features used to label each emotion captured in each voice sample ([0058] the system may determine and predict SLA reporting metrics by applying feature selection and a classification system to a large sampled data set of contextually relevant and call center application specific agent/customer interactions; the system will discover and extract a large number of salient features from the customer call database of recorded interactions and translate this into a large number of classifier parameters that are relevant to predicting SLA metrics; the system will be provided with a training pattern set of agent/customer interactions and limit the feature set in order to design classifiers with proper generalization capabilities and low error rate).

Claim 5,
Haukioja further teaches the method of claim 4, further comprising: flagging, by the emotion identification application, a particular call providing customer support based on the type of emotion ([0029] the system provides a truer snapshot of the agent/customer interaction and overall customer satisfaction by automatically generating emotional tags and labels throughout the customer call; the system generates a live customer satisfaction level during the call, indicating trending towards dis-satisfaction, concern acknowledgment, resolution, exceeding expectations, or happiness, etc.).

Claim 6,
Haukioja further teaches the method of claim 2, further comprising: flagging, by the emotion identification application, the particular call providing customer support based on an intensity level of the audio determined by the visual representation for following up at a later time by the service agent ([0062] the supervisor may search an agent's database of agent/customer calls based on predicted SLA metrics; the supervisor may perform a system database query on a given agent, for all calls for customers with salient feature classification of: difficult, angry, upset, demanding, etc., and determine the agent's average resolution or customer satisfaction score for those calls; the supervisor will be able to query the system database for the amount of agent/customer interactions, or cases, in which an agent was able to calm the initially difficult customer, achieve positive trending customer satisfaction, and score over a certain resolution threshold; the system will be able to provide useful metric and scoring data for determining individual agent performance, as well as overall call center performance).

Claim 7,
Haukioja further teaches the method of claim 1, further comprising: computing, by the emotion identification application, emotion content in each voice sample based on the voice emotion features captured in the voice sample ([0007] Speech emotion recognition (SER) is performed to compute the emotion spectrum of the voices, i.e., happy, sad, angry, neutral, etc.; and the system is also trained to extract and recognize contextually salient features from the audio sample and ASR/SER data).

Claim 8,
Haukioja in view of Liu in view of Jain teach a computer program product tangibly embodied in a computer- readable storage device and comprising instructions that when executed by a processor, perform a method for emotion identification in audio of a customer's voice, the method comprising: implementing an emotion identification application to identify emotions captured in a voice of the customer from audio received by a media streaming device; receiving, by the emotion identification application, an audio stream of a series of voice samples contained in consecutive frames from audio received; extracting, by the emotion identification application, a set of voice emotion features from each frame in each voice sample of the audio by applying a trained machine learning (ML) model, wherein the trained ML model identifies emotions utilizing a neural network to determine one or more voice emotions, wherein the set of voice emotion features comprises a spectral flatness, spectral centroid, spectral roll off, spectral bandwidth, zero-crossing rate, and root-mean-square energy; and  classifying, by the emotion identification application, each voice emotion determined by the trained ML model based on at least one of the spectral flatness, spectral centroid, spectral roll off, spectral bandwidth, zero-crossing rate, and root-mean-square energy to label one or more types of emotions captured in each voice sample. (Claim 8 contains subject matter similar to claim 1, and thus is rejected under similar rationale)

Claim 9,
Haukioja further teaches the computer program product of claim 8, wherein the method further comprises: classifying, by the emotion identification application, at least one emotion type based on an intensity level of audio in a visual representation of the voice sample  ([0048] real-time compliance data is also observable with the system by providing indicators across the call center agent team showing specific activity with respect to SLA metrics; the system can provide live streaming data displaying the agent activity and scoring levels; the call center performance under the SLA contract can be viewed contemporaneously with live sampling of agent/customer activity data). 

Claim 10,
Haukioja further teaches the computer program product method of claim 8, wherein the one or more emotions at least comprise one or more emotion types of angry, disgust, happy, fear, sad, surprise, and neutral emotions ([0051] emotional classification of the sample by labeling with: happy, sad, etc.).

Claim 11,
Haukioja further teaches the computer program product of claim 8, wherein the method further comprises: predicting, by the emotion identification application, one or more emotions based on a prior set of classifying features used to label each emotion captured in each voice sample ([0058] the system may determine and predict SLA reporting metrics by applying feature selection and a classification system to a large sampled data set of contextually relevant and call center application specific agent/customer interactions; the system will discover and extract a large number of salient features from the customer call database of recorded interactions and translate this into a large number of classifier parameters that are relevant to predicting SLA metrics; the system will be provided with a training pattern set of agent/customer interactions and limit the feature set in order to design classifiers with proper generalization capabilities and low error rate).

Claim 12,
Haukioja further teaches the computer program product of claim 11, wherein the method further comprises: flagging, by the emotion identification application, a particular call providing customer support based on the type of emotion ([0029] the system provides a truer snapshot of the agent/customer interaction and overall customer satisfaction by automatically generating emotional tags and labels throughout the customer call; the system generates a live customer satisfaction level during the call, indicating trending towards dis-satisfaction, concern acknowledgment, resolution, exceeding expectations, or happiness, etc.).

Claim 13,
Haukioja further teaches the computer program product of claim 8, wherein the method further comprises: flagging, by the emotion identification application, the particular call providing customer support based on an intensity level of the audio determined by the visual representation for following up at a later time by the service agent ([0062] the supervisor may search an agent's database of agent/customer calls based on predicted SLA metrics; the supervisor may perform a system database query on a given agent, for all calls for customers with salient feature classification of: difficult, angry, upset, demanding, etc., and determine the agent's average resolution or customer satisfaction score for those calls; the supervisor will be able to query the system database for the amount of agent/customer interactions, or cases, in which an agent was able to calm the initially difficult customer, achieve positive trending customer satisfaction, and score over a certain resolution threshold; the system will be able to provide useful metric and scoring data for determining individual agent performance, as well as overall call center performance).

Claim 14,
A system for voice emotion identification contained in audio in a call providing customer support between a customer and a service agent, the system comprising: a server configured to utilize an emotion identification application to identify emotions captured in a voice of the customer from audio received by a media streaming device, and the server configured to: receive an audio stream of a series of voice samples contained in consecutive frames from audio received from the media streaming device; extract a set of voice emotion features from each frame in each voice sample of the audio by applying a trained machine learning (ML) model, wherein the trained ML model uses a neural network to determine one or more voice emotions, wherein the set of voice emotion features comprises a spectral flatness, spectral centroid, spectral roll off, spectral bandwidth, zero-crossing rate, and root-mean-square energy; and classify each emotion determined by the trained ML model based on at least one of the spectral flatness, spectral centroid, spectral roll off, spectral bandwidth, zero-crossing rate, and root-mean-square energy to label one or more types of emotions captured in each voice sample. (Claim 14 contains subject matter similar to claim 1, and thus is rejected under similar rationale)

Claim 15,
The system of claim 14, wherein the server is configured to classify at least one emotion type based on an intensity level of audio in a visual representation of the voice sample. (Claim 15 contains subject matter similar to claim 2, and thus is rejected under similar rationale)

Claim 16,
The system of claim 14, wherein the one or more emotion at least comprises one or more emotion types of angry, disgust, happy, fear, sad, surprise, and neutral emotions. (Claim 16 contains subject matter similar to claim 3, and thus is rejected under similar rationale)

Claim 17,
The system of claim 14, wherein the server is configured to predict one or more emotions from a set of voice emotions based on a prior set of classifying features used to label each voice emotion captured in each voice sample. (Claim 17 contains subject matter similar to claim 4, and thus is rejected under similar rationale)

Claim 18,
The system of claim 15, wherein the server is configured to flag a particular call providing customer support based on the type of emotion which has been determined. (Claim 18 contains subject matter similar to claim 5, and thus is rejected under similar rationale)

Claim 19,
The system of claim 16, wherein the server is configured to flag the particular call providing customer support based on an intensity level of the audio determined by the visual representation for following up at a later time by the service agent. (Claim 19 contains subject matter similar to claim 6, and thus is rejected under similar rationale)

Claim 20,
The system of claim 16, wherein the server is configured to compute emotion content in each voice sample based on the voice emotion features captured in the voice sample. (Claim 20 contains subject matter similar to claim 7, and thus is rejected under similar rationale)

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHREYANS A PATEL whose telephone number is (571)270-0689. The examiner can normally be reached Monday-Friday 8am-5pm PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

SHREYANS A. PATEL
Examiner
Art Unit 2657



/SHREYANS A PATEL/               Examiner, Art Unit 2656