DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

Response to Amendments and Arguments
Regarding a claim interpretation under U.S.C. §112(f), applicant amended relevant claims by including sufficient structure elements. The claims are no longer interpreted under §112(f). 

Regarding the rejection under 35 U.S.C. §101, applicant amended claims 8 and 9 adding “A non-transitory computer-readable recording medium” to exclude transitory type medium. The rejection under §101 has been withdrawn. 

The examiner notes that the recited limitation in claims 8 and 9 “for causing a computer to operation as …” is an intend use. Any program stored on the medium having an intended use as the claimed would meet the claimed scope.
 
The examiner suggested amending claims by reciting limitations in a positive language. Here is an example of suggested amendment: 

“A non-transitory computer-readable recording medium on which a satisfaction estimation model learning program is recorded, when executed by a computer, the program ”  
 
Regarding to the rejections under 35 U.S.C. §102 and §103, applicant amended independent claims 1 and 6 by adding a new limitation:

“wherein the speech satisfaction estimation model part is hierarchically connected with the conversation satisfaction estimation model part.”

	Applicant argued (Remarks, pages 7-9) that the previously cited references fail to teach the newly added limitation. After performing an update search, the examiner discovered a reference to Ruder et al. (“A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis”, 2016). 

	Ruder discloses analyzing customer reviews to determine an user’s opinion (e.g., positive or negative sentiment) for products / services (Section 1, Introduction). Ruder discloses using hierarchically connected LSTM networks at both a sentence level and a review level (Section 3, Fig. 2). An LSTM neural network is a type of recurrent neural network (Section 3.2) and the network structure in Ruder is similar to that of the instant application (Disclosure, Fig. 1-3). 

In the following rejection, the examiner combines Ruder with previously cited references to rejected the amended claims. Applicant’s arguments have been considered but are moot because the arguments do not apply to combined references being used in the current rejection.
 
Claim Objections
Claims 5, 7 and 9 are objected to because of the following informalities:  

Claim 5 is directed to an apparatus and drafted as an independent claim format. However, claim 5 refers to other apparatus claims 1-3. Claim 5 is not a proper independent claim. If claim 5 is intended to be a dependent claim, the claim preamble must be changed. 

Claims 7 and 9 have a similar issue. 

Appropriate correction is required.

	Claim Rejections - 35 USC § 103
Claim 1-3 and 5-9 are rejected under 35 U.S.C. 103 as being unpatentable over Hammer et al. (US Pub No. 2019/0005421, referred to as Hammer) in view of Ruder et al. (“A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis”, 2016, referred as Ruder).

Regarding claim 1, Hammel et al. teaches a satisfaction estimation model learning apparatus, (see [0087 also supported in provisional at page 23, line 10], AI System Controller) comprising: a learning data storage unit (see [0005 also supported in provisional at page 23, line 29], Knowledge Base) that stores learning data including a conversation voice containing a conversation including a plurality of speeches, (see [0088 or provisional at p. 24, line 15-16], where audio files are processed by the Unstructured Data Analysis Engine into homogeneous voice segments that get stored in the Knowledge Base. The voice segments may come from different speakers, and the “identification of the pieces may be accomplished with a learning algorithm as homogeneous human speech, simultaneous speech, or noninformative parts.” [0088]. For instance, see [0088], where “customer voice can be identified as Speaker A and agent voice can be identified as Speaker B” and “obtaining an ‘ABABAB’ type analysis of the conversation may be done (eg. conversation voice with a plurality of speeches), which prepares the data for Speaker A&B Feature Extraction” [0088]).  
a correct answer value of a conversation satisfaction for the conversation, and a correct answer value of a speech satisfaction for each of the speeches included in the conversation; (See [0008 or provisional at p. 3, line 27-8], where an Initial Feature Set “describes individual feature vectors” and may be stored in the Knowledge Base, and see [0089 or provisional at p. 25, line 28-31], where the individual feature vectors are extracted from the homogeneous voice segments (eg. conversation data including speeches). In context of this, see [0009 or provisional at p. 4, line 7-10], where “the system may align feature vectors with Prediction Classes. Prediction Classes comprise mathematical values designating a correct outcome or desired descriptor, also known as labels, targets, desired outputs, supervisory signals, response variables, or explained variables (eg. correct value of speech/conversation satisfaction)”)
and a model learning unit that learns a satisfaction estimation model using a feature quantity of each speech extracted from the conversation voice, , (see [0090, or provisional p. 26, line 14-24], where “AI System Controller 21 may instruct Unstructured Data Analysis Engine 32 to access Seed Knowledge Base 61 and utilize a pre-existing Emotional Model to analyze and determine emotions, emotional based behaviors, and/or emotional states from one or more extracted features from homogeneous voice segments. (eg. using a feature quantity of each speech extracted from the conversation voice) AI System Controller 21 may instruct Unstructured Data Analysis Engine 32 to access statistical functions contained in Tools Knowledge Base 63 and dynamically build an Emotional Model. The dynamically built (eg. model learning unit) Emotional Model is used by Unstructured Data Analysis Engine 32 to analyze and determine emotions, emotional based behaviors, and/or emotional states based on one or more extracted features from homogeneous voice segments”)
the correct answer value of the speech satisfaction, and the correct answer value of the conversation satisfaction, (see [0420 or provisional p. 4, lines 17-20], where a data point is “a feature vector (eg. derived from the conversation data made up of homogeneous voice segments/speech excerpts) specifically to be used for model training”; see [0009 or provisional p. 4, line 11-13], where the Initial Mapping Ruleset(s) that “align feature vectors with Prediction Classes” also define “the needed context(s) for determining Initial Data Points (the correct values for prediction classes that will be used for the model training)” and see [0010 or provisional p. 4, lines 17-18], where “the system may identify an Initial Data Point as a single feature vector or an aggregate of feature vectors. (the aggregation of the feature vectors from homogeneous voice segments is interpreted as a conversation)”)
the satisfaction estimation model configured by connecting a speech satisfaction estimation model part that receives a feature quantity of each speech and estimates the speech satisfaction of each speech  (see [0424, also supported by provisional p. 26], where the emotional model is defined as a model that takes “a numerical representation of an audio signal containing voice (eg. speech) and maps it to the most likely emotion (eg. satisfaction) or behavior being expressed therein.” and see [0090, or provisional lines 18-24], where “AI System Controller 21 may instruct Unstructured Data Analysis Engine 32 to access Seed Knowledge Base 61 and utilize a pre-existing Emotional Model to analyze and determine emotions, emotional based behaviors, and/or emotional states (eg. estimate satisfaction) from one or more extracted features from homogeneous voice segments. (eg. using a feature quantity of each speech extracted from the conversation voice)”)
with a conversation satisfaction estimation model part that receives at least the speech satisfaction of each speech and estimates the conversation satisfaction (see [0090, or provisional p. 26 lines 24-28], where “Unstructured Data Analysis Engine 32 may dynamically create Emotional Transition State features. Statistical analysis of Emotional Transition State features determines whether consecutive agent and/or customer homogeneous voice segments separately (eg. multiple sequential homogeneous voice segments analyzed in conjunction with each other is interpreted as conversation, and the emotional state of each separate voice segment constitutes a speech satisfaction) contain changes in different emotional states. (eg. conversation satisfaction)”)

Hammel does not discloses wherein the speech satisfaction estimation model part is hierarchically connected with the conversation satisfaction estimation model part.

Ruder discloses analyzing customer reviews to determine user’s opinion (e.g., positive or negative sentiment) for a product / service (Section 1, Introduction). Ruder discloses using hierarchically connected LSTM networks at both sentence level and review level (Section 3, Fig. 2). The examiner notices that LSTM is a type of recurrent neural network (Section 3.2) and the network structure is similar to that of the instant application (Disclosure, Fig. 1-3), on which the newly added limitation based.  

It would have been obvious to a person having ordinary skill in the art at the time the invention was filed to modify Hammel’s teaching with Ruder’s teaching to use a hierarchically connected neural network models to determine user’s opinion. One having ordinary skill in the art would have been motivated to make such a modification obtain superior performance (Ruder, section 5). In addition, all the claimed elements were known in the prior art and one skilled in the art could have combined the elements as claimed by known methods, and in the combination each element merely would have performed the same function as it did separately. “A combination of familiar elements according to known methods is likely to be obvious when it does no more than yield predictable results.” KSR, 550 U.S. ___, 82 USPQ2d at 1395 (2007). One of ordinary skill in the art would have recognized that the results of the combination were predictable.

Regarding claim 2, Hammel in view of Rader teaches wherein the speech satisfaction estimation model part constitutes one speech satisfaction estimator for one speech, the speech satisfaction estimator receives the feature quantity of each speech and estimates and outputs the speech satisfaction of the speech (see [0090, or provisional or p. 26, lines 14-18], where the “AI System Controller 21 may instruct Unstructured Data Analysis Engine 32 to access statistical functions contained in Tools Knowledge Base 63 and dynamically build an Emotional Model. The dynamically built (eg. one speech satisfaction estimator for one speech, as the estimator part in the mode learning unit dynamically changes with each input) Emotional Model is used by Unstructured Data Analysis Engine 32 to analyze and determine emotions, emotional based behaviors, and/or emotional states based on one or more extracted features from homogeneous voice segments (eg. using a feature quantity of each speech extracted from the conversation voice)”)
the conversation satisfaction estimation model part constitutes one conversation satisfaction estimator for one speech satisfaction estimator, and the conversation satisfaction estimator receives the speech satisfaction outputted from the speech satisfaction estimator and information contributing to the estimation of the conversation satisfaction accompanied by the speech satisfaction, using information related to a speech before the speech or speeches before and after the speech and estimates and outputs the conversation satisfaction from a first speech included in the conversation to the speech using the information related to the speech before the speech. (see [0090, or provisional p. 26, lines 25-31], where “Unstructured Data Analysis Engine 32 may dynamically (eg. one conversation satisfaction estimator for one speech satisfaction estimator, as the estimator part in the mode learning unit dynamically changes with each input) create Emotional Transition State features. Statistical analysis of Emotional Transition State features determines whether consecutive (eg. using information related to a speech before the speech or speeches before and after the speech) agent and/or customer homogeneous voice segments separately (eg. multiple sequential homogeneous voice segments analyzed in conjunction with each other is interpreted as conversation, and the emotional state of each separate voice segment constitutes a received speech satisfaction) contain changes in different emotional states. (eg. conversation satisfaction)… [Unstructured Data Analysis Engine 32 stores] Emotional State Transition features in Knowledge Base”)

Regarding claim 3, Hammel in view of Rader further discloses herein the speech satisfaction estimator and the conversation satisfaction estimator-include any one of an input gate and an output gate, an input gate and an output gate and an oblivion gate, and a reset gate and an update gate (Rader, section 3.2, Examiner note, claim 3 recites limitation related to an element of a RNN network, LSTM is a type of RNN network).

Regarding claims 5 and 8-9, Hammel in view of Rader teaches a model storage unit that stores the satisfaction estimation model learned by the satisfaction estimation model learning apparatus according to any one of claims 1 to 3, (see [0090, or provisional p. 26] and Figure 1 p. 1, where the parts involved in the Emotional Model, like Knowledge Base and the Unstructured Data Analysis Engine, for estimating satisfaction are stored in the Processing Database 60 and Server 20)
and program causing a computer to function as the satisfaction estimation model learning apparatus according to any one of claims 1 to 3 or a program causing a computer to function as the satisfaction estimating apparatus according to claim 5 (see [0453 or provisional p. 156, lines 28-31 and p. 157, lines 1-15], where the “processing may be implemented in computer programs executed on programmable computers”)
 Furthermore, Hammel et al. teaches a satisfaction estimating unit that inputs the feature quantity of each speech extracted from the conversation voice containing the conversation including a plurality of speeches to the satisfaction estimation model , (see [0090, or provisional p. 26, lines 14-24], where “AI System Controller 21 may instruct Unstructured Data Analysis Engine 32 to access Seed Knowledge Base 61 and utilize a pre-existing Emotional Model to analyze and determine emotions, emotional based behaviors, and/or emotional states from one or more extracted features from homogeneous voice segments. (eg. using a feature quantity of each speech extracted from the conversation voice) AI System Controller 21 may instruct Unstructured Data Analysis Engine 32 to access statistical functions contained in Tools Knowledge Base 63 and dynamically build an Emotional Model. The dynamically built (eg. model learning unit) Emotional Model is used by Unstructured Data Analysis Engine 32 to analyze and determine emotions, emotional based behaviors, and/or emotional states based on one or more extracted features from homogeneous voice segments”)
and estimates the speech satisfaction for each speech and the conversation satisfaction for the conversation , (see [0090, or provisional p. 26, line 18-24], where “AI System Controller 21 may instruct Unstructured Data Analysis Engine 32 to access Seed Knowledge Base 61 and utilize a pre-existing Emotional Model to analyze and determine emotions, emotional based behaviors, and/or emotional states (eg. estimate satisfaction) from one or more extracted features from homogeneous voice segments. (eg. using a feature quantity of each speech extracted from the conversation voice)” and where “Unstructured Data Analysis Engine 32 may dynamically create Emotional Transition State features. Statistical analysis of Emotional Transition State features determines whether consecutive agent and/or customer homogeneous voice segments separately (eg. multiple sequential homogeneous voice segments analyzed in conjunction with each other is interpreted as conversation, and the emotional state of each separate voice segment constitutes a received speech satisfaction) contain changes in different emotional states. (eg. conversation satisfaction)”)

Regarding claim 6, Hammel et al. teaches a satisfaction estimation model learning method, wherein learning data including a conversation voice containing a conversation including a plurality of speeches, (see [0088 or provisional at p. 24, line 15-16], where audio files are processed by the Unstructured Data Analysis Engine into homogeneous voice segments that get stored in the Knowledge Base. The voice segments may come from different speakers, and the “identification of the pieces may be accomplished with a learning algorithm as homogeneous human speech, simultaneous speech, or noninformative parts.” [0088]. For instance, see [0088], where “customer voice can be identified as Speaker A and agent voice can be identified as Speaker B” and “obtaining an ‘ABABAB’ type analysis of the conversation may be done (eg. conversation voice with a plurality of speeches), which prepares the data for Speaker A&B Feature Extraction” [0088]).
a correct answer value of a conversation satisfaction for the conversation, and a correct answer value of a speech satisfaction for each of the speeches included in the conversation is stored in a learning data storage unit, (See [0008 or provisional at p. 3, line 27-8], where an Initial Feature Set “describes individual feature vectors” and may be stored in the Knowledge Base, and see [0089 or provisional at p. 25, line 28-31], where the individual feature vectors are extracted from the homogeneous voice segments (eg. conversation data including speeches). In context of this, see [0009 or provisional at p. 4, line 7-10], where “the system may align feature vectors with Prediction Classes. Prediction Classes comprise mathematical values designating a correct outcome or desired descriptor, also known as labels, targets, desired outputs, supervisory signals, response variables, or explained variables (eg. correct value of speech/conversation satisfaction)”)
learning, by a model learning unit, a satisfaction estimation model using a feature quantity of each speech extracted from the conversation voice, (see [0090, or provisional p. 26, line 14-24], where “AI System Controller 21 may instruct Unstructured Data Analysis Engine 32 to access Seed Knowledge Base 61 and utilize a pre-existing Emotional Model to analyze and determine emotions, emotional based behaviors, and/or emotional states from one or more extracted features from homogeneous voice segments. (eg. using a feature quantity of each speech extracted from the conversation voice) AI System Controller 21 may instruct Unstructured Data Analysis Engine 32 to access statistical functions contained in Tools Knowledge Base 63 and dynamically build an Emotional Model. The dynamically built (eg. model learning unit) Emotional Model is used by Unstructured Data Analysis Engine 32 to analyze and determine emotions, emotional based behaviors, and/or emotional states based on one or more extracted features from homogeneous voice segments”)
the correct answer value of the speech satisfaction, and the correct answer value of the conversation satisfaction, (see [0420 or provisional p. 4, lines 17-20], where a data point is “a feature vector (eg. derived from the conversation data made up of homogeneous voice segments/speech excerpts) specifically to be used for model training”; see [0009 or provisional p. 4, line 11-13], where the Initial Mapping Ruleset(s) that “align feature vectors with Prediction Classes” also define “the needed context(s) for determining Initial Data Points (the correct values for prediction classes that will be used for the model training)” and see [0010 or provisional p. 4, lines 17-18], where “the system may identify an Initial Data Point as a single feature vector or an aggregate of feature vectors. (the aggregation of the feature vectors from homogeneous voice segments is interpreted as a conversation)”)
the satisfaction estimation model configured by connecting a speech satisfaction estimation model part that receives a feature quantity of each speech and estimates the speech satisfaction of each speech (see [0424, also supported by provisional p. 26], where the emotional model is defined as a model that takes “a numerical representation of an audio signal containing voice (eg. speech) and maps it to the most likely emotion (eg. satisfaction) or behavior being expressed therein.” and see [0090, or provisional lines 18-24], where “AI System Controller 21 may instruct Unstructured Data Analysis Engine 32 to access Seed Knowledge Base 61 and utilize a pre-existing Emotional Model to analyze and determine emotions, emotional based behaviors, and/or emotional states (eg. estimate satisfaction) from one or more extracted features from homogeneous voice segments. (eg. using a feature quantity of each speech extracted from the conversation voice)”)
with a conversation satisfaction estimation model part that receives at least the speech satisfaction of each speech and estimates the conversation satisfaction. (see [0090, or provisional p. 26 lines 24-28], where “Unstructured Data Analysis Engine 32 may dynamically create Emotional Transition State features. Statistical analysis of Emotional Transition State features determines whether consecutive agent and/or customer homogeneous voice segments separately (eg. multiple sequential homogeneous voice segments analyzed in conjunction with each other is interpreted as conversation, and the emotional state of each separate voice segment constitutes a speech satisfaction) contain changes in different emotional states. (eg. conversation satisfaction)”)

Regarding claim 7, Hammel et al. wherein the satisfaction estimation method comprising: inputting, by a satisfaction estimating unit, the feature quantity of each speech extracted from the conversation voice containing the conversation including a plurality of speeches to the satisfaction estimation model , (see [0090, or provisional p. 26, line 14-24], where “AI System Controller 21 may instruct Unstructured Data Analysis Engine 32 to access Seed Knowledge Base 61 and utilize a pre-existing Emotional Model to analyze and determine emotions, emotional based behaviors, and/or emotional states from one or more extracted features from homogeneous voice segments. (eg. using a feature quantity of each speech extracted from the conversation voice) AI System Controller 21 may instruct Unstructured Data Analysis Engine 32 to access statistical functions contained in Tools Knowledge Base 63 and dynamically build an Emotional Model. The dynamically built (eg. model learning unit) Emotional Model is used by Unstructured Data Analysis Engine 32 to analyze and determine emotions, emotional based behaviors, and/or emotional states based on one or more extracted features from homogeneous voice segments”)
and estimating the speech satisfaction for each speech and the conversation satisfaction for the conversation. , (see [0090, or provisional p. 26, line 14-24], where “AI System Controller 21 may instruct Unstructured Data Analysis Engine 32 to access Seed Knowledge Base 61 and utilize a pre-existing Emotional Model to analyze and determine emotions, emotional based behaviors, and/or emotional states (eg. estimate satisfaction) from one or more extracted features from homogeneous voice segments. (eg. using a feature quantity of each speech extracted from the conversation voice)” and where “Unstructured Data Analysis Engine 32 may dynamically create Emotional Transition State features. Statistical analysis of Emotional Transition State features determines whether consecutive agent and/or customer homogeneous voice segments separately (eg. multiple sequential homogeneous voice segments analyzed in conjunction with each other is interpreted as conversation, and the emotional state of each separate voice segment constitutes a received speech satisfaction) contain changes in different emotional states. (eg. conversation satisfaction)”)

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Hammer in view of Ruder and further in view of Senior et al. (US Pub No. US 20170011738).

Regarding claim 4, Hammel in view of Ruder discloses limitations recited in claims 1-3. Ruder discloses hierarchical connected two-level LSTM neural networks (Ruder, Fig. 2). The claimed limitations are related to details of a recurrent neural network. LSTM is a particular type neural network. Hammel in view of Ruder do not explicitly disclose the loss function of the weighted sum of a loss function of from two different model parts such as for the speech satisfaction estimation model part and the conversation satisfaction estimation model part. However, Senior et al. teaches wherein a loss function is a weighted sum of two different loss functions from different neural networks for estimating what words are most likely to be identified in speech (see [0008], where “a second neural network may be trained based on the outputs of the first neural network to generate outputs indicating likelihoods for a second set of phonetic units that is different from the first set used by the first neural network.”, and see also [0017], where the training the second neural network involves “using a loss function that is a weighted combination of the two or more loss functions.”).  

It would have been obvious to a person having ordinary skill in the art at the time the invention was made to modify Hammer in view of Ruder’s teaching with Senior’s teaching to jointly training both sentence level and review level (i.e., dialog level) neural network models by adjusting weights. One having ordinary skill in the art would have been motivated to make such a modification to improve performance (Senior, [0004]). In addition, all the claimed elements were known in the prior art and one skilled in the art could have combined the elements as claimed by known methods, and in the combination each element merely would have performed the same function as it did separately. “A combination of familiar elements according to known methods is likely to be obvious when it does no more than yield predictable results.” KSR, 550 U.S. ___, 82 USPQ2d at 1395 (2007). One of ordinary skill in the art would have recognized that the results of the combination were predictable.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jialong He, whose telephone number is (571) 270-5359.  The examiner can normally be reached on Monday – Friday, 8:00AM – 4:30PM, EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Pierre Desir can be reached on (571) 272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JIALONG HE/Primary Examiner, Art Unit 2659