DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement(s) (IDS) submitted on November 22, 2019 is/are being considered by the examiner.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 5 and 10 recites the limitation “the recommendation degrees” in line 6 and line 7, respectively.  There is insufficient antecedent basis for this limitation in the claim.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:



Claims 1, 3, 5-6, 8, and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Non-patent literature to Kalchbrenner (Kalchbrenner, N., and Blunsom, P., “Recurrent convolutional neural networks for discourse compositionality.” arXiv preprint arXiv:1306.3584,  (2013), hereinafter Kalchbrenner) in view of Non-patent literature to Yoo (Yoo D., Ko Y., Seo J. “Speech-act classification using a convolutional neural network based on pos tag and dependency-relation bi-gram embedding,” IEICE Trans. Inf. Sys., 100 (12) (2017), pp. 3081-3084 [Manuscript published on August 23, 2017], hereinafter Yoo).

Regarding claim 1, Kalchbrenner discloses A speech act analysis device comprising: (“model for sentential compositionality [and] discourse compositionality”; Kalchbrenner, ¶¶ Pg. 2, col. 1, para. 2 and 3) a conversation vector generator that generates a conversation unit input utterance vector that is vectorized from information with respect to the input utterance in a conversation including the input utterance (discloses “sentence model {a conversation vector generator} is to compute a vector for a sentence s” where “a sentence s is paired to the matrix Ms whose columns are given sequentially by the vectors of the words {thus, including one or more words forming an input utterance} in s”.; Kalchbrenner, ¶¶ Pg. 3, col. 1, paras. 2 and 3) by inputting the input utterance similarity vector in a convolution neural network (“The sentence model is taken to be a CNN where the convolution operation is applied one dimensionally across a single feature and in a hierarchical manner.”; Kalchbrenner, ¶¶ pg. 3, col. 2, para. 7); a conversation similarity calculator that receives a speaker vector that is vectorized from speaker information of the input utterance (“RCNN computes probability Kalchbrenner, ¶¶ pg. 5, Col. 2, para. 6 and Col. 1, para. 4), and generates a conversation unit input utterance similarity vector that reflects similarity between the conversation unit input utterance vector and the speaker vector (“At each step, the RCNN takes as input the current sentence vector si generated through the HCNN sentence model {the conversation unit input utterance vector} and the previous label xi−1 {the speaker vector} to predict a probability distribution over the current label P(xi) {generates a conversation unit input utterance similarity vector}.”; Kalchbrenner, ¶¶ pg. 3, Col. 1, para. 1); and a speech act classifier that determines a speech act of the input utterance by inputting the conversation unit input utterance similarity vector in a recurrent neural network. (“The discourse model coupled to the sentence model is based on a RNN architecture with inputs from a HCNN and with the recurrent and output weights conditioned on the respective speakers”; Kalchbrenner, ¶¶ pg. 5, Col. 2, para. 5). However, Kalchbrenner fails to expressly recite a word similarity calculator that receives an input utterance vector that is vectorized from information on at least one or more words forming an input utterance and a previous speech act vector that is vectorized from speech act information with respect to a previous utterance of the input utterance and generates an input utterance similarity vector that reflects similarity between the input utterance vector and the previous speech act vector.
Yoo teaches “a deep learning based model for classifying speech-acts using a convolutional neural network (CNN).” (Yoo, Abstract). Regarding claim 1, Yoo teaches a word similarity calculator that receives an input utterance vector that is vectorized from information on at least one or more words forming an input utterance (dependency parser forms distributed representations using the input utterance, which is “converted into vector Yoo, ¶¶ Pg. 3082, col. 1, para. 1; col. 2, para. 8; FIG. 1), and a previous speech act vector that is vectorized from speech act information with respect to a previous utterance of the input utterance (“the speech-act of previous utterances is initialized…” as a vector of the same size as the input utterance vector.; Yoo, ¶¶ Pg. 3082, col. 1, para. 1; FIG. 1), and generates an input utterance similarity vector that reflects similarity between the input utterance vector and the previous speech act vector (“Distributed representations” are formed from “extracted unigram and bigram features” thus, from the “morpheme unigram, morpheme bigram, POS tag bigram {input utterance vector}” and “Dependency-relationship bigrams {previous speech act vector}” through a dependency parser, thus reflecting similarity between the speech-act of the previous utterance and the input utterance.; Yoo, ¶¶ p. 3082, col. 2 para 8 - p. 3083, col. 1, para. 1; FIGS. 2-3; Table 1).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the recurrent convolutional neural network of Kalchbrenner to incorporate the teachings of Yoo to include a word similarity calculator that receives an input utterance vector that is vectorized from information on at least one or more words forming an input utterance and a previous speech act vector that is vectorized from speech act information with respect to a previous utterance of the input utterance and generates an input utterance similarity vector that reflects similarity between the input utterance vector and the previous speech act vector. The disclosed speech act classification model shows increased accuracy over “competitive models in previous studies,” as recognized by Yoo. (Yoo, Abstract).

Regarding claim 3, Kalchbrenner discloses wherein the conversation vector generator generates the conversation unit input utterance vector by normalizing the input utterance similarity vector into a predetermined size (“ kernel sizes” for the HCNN {conversation vector generator} increase by one until the resulting convolved vector {generates Kalchbrenner, ¶¶ pg. 4, col. 2, para. 4) through the convolution neural network (the generation occurs in the HCNN {through the convolution neural network}.; Kalchbrenner, ¶¶ pg. 4, col. 2, para. 4).

Regarding claim 5, Kalchbrenner discloses wherein the speech act classifier determines at least one or more candidate speech acts with respect to the input utterance by inputting the conversation unit input utterance similarity vector in the recurrent neural network (Describes a discourse model which labels dialogue acts {one or more candidate speech acts} with respect to a conversation {the input utterance} where “discourse model coupled to the sentence model is based on a RNN architecture with inputs from a HCNN {thus inputting the conversation unit input utterance similarity vector} and with the recurrent and output weights conditioned on the respective speakers”; Kalchbrenner, ¶¶ pg. 5, col. 2, para. 4-5), and determines a speech act of the input utterance among the candidates speech acts (“The RCNN computes probability distributions pi {determines} for the label {speech act} at step I”; Kalchbrenner, ¶¶ pg. 5, col. 2, para. 4-5) based on the recommendation degrees of the candidate speech acts (the dialog act label is determined based on the probability distribution {recommendation degrees}; Kalchbrenner, ¶¶ pg. 5, col. 2, para. 4-5; Table 1).

Regarding claim 6, Kalchbrenner discloses A method for a speech act analysis device to determine a speech act, comprising: (“model for sentential compositionality [and] discourse compositionality”; Kalchbrenner, ¶¶ Pg. 2, col. 1, para. 2 and 3) generating a conversation unit input utterance vector that is vectorized from information with respect to the input utterance in a conversation including the input utterance (discloses “sentence model {a conversation vector generator} is to compute a vector for a sentence s” where “a sentence s is paired to the matrix Ms whose columns are given sequentially by the vectors of the words {thus, Kalchbrenner, ¶¶ Pg. 3, col. 1, paras. 2 and 3) by inputting the input utterance similarity vector in a convolution neural network (“The sentence model is taken to be a CNN where the convolution operation is applied one dimensionally across a single feature and in a hierarchical manner.”; Kalchbrenner, ¶¶ pg. 3, col. 2, para. 7); receiving a speaker vector that is vectorized from speaker information of the input utterance (“RCNN computes probability distributions pi for the label at step i by iterating the following equations” where the equation includes Ixi-1 (the previous label) {thus, receiving the speaker vector} and where Ixi-1 is a vector of “the speaker’s previous utterances as opposed to other speakers’ previous utterances” and “concerning the speaker’s interactions” {vectorized from speaker information of the input utterance}; Kalchbrenner, ¶¶ pg. 5, Col. 1, para. 4, and Col. 2, paras. 5 and 6), and generating a conversation unit input utterance similarity vector that reflects similarity between the conversation unit input utterance vector and the speaker vector (“At each step, the RCNN takes as input the current sentence vector si generated through the HCNN sentence model {the conversation unit input utterance vector} and the previous label xi−1 {the speaker vector} to predict a probability distribution over the current label P(xi) {generates a conversation unit input utterance similarity vector}.”; Kalchbrenner, ¶¶ pg. 3, Col. 1, para. 1); and determining a speech act of the input utterance by inputting the conversation unit input utterance similarity vector in a recurrent neural network. (“The discourse model coupled to the sentence model is based on a RNN architecture with inputs from a HCNN and with the recurrent and output weights conditioned on the respective speakers”; Kalchbrenner, ¶¶ pg. 5, Col. 2, para. 6). However, Kalchbrenner fails to expressly recite receiving an input utterance vector that is vectorized from information on at least one or more words forming an input utterance and a previous speech act vector that is vectorized from speech act information with respect to a previous utterance of the input utterance and generating an input utterance similarity vector that reflects similarity between the input utterance vector and the previous speech act vector.
Yoo teaches “a deep learning based model for classifying speech-acts using a convolutional neural network (CNN).” (Yoo, Abstract). Regarding claim 6, Yoo teaches receiving an input utterance vector that is vectorized from information on at least one or more words forming an input utterance (dependency parser forms distributed representations using the input utterance, which is “converted into vector representations... including morpheme unigram, morpheme bigram, POS tag bigram… represented as vectors by a word embedding technique”; Yoo, ¶¶ Pg. 3082, col. 1, para. 1; col. 2, para. 8; FIG. 1), and a previous speech act vector that is vectorized from speech act information with respect to a previous utterance of the input utterance (“the speech-act of previous utterances is initialized…” as a vector of the same size as the input utterance vector.; Yoo, ¶¶ Pg. 3082, col. 1, para. 1; FIG. 1), and generating an input utterance similarity vector that reflects similarity between the input utterance vector and the previous speech act vector (“Distributed representations” are formed from “extracted unigram and bigram features” thus, from the “morpheme unigram, morpheme bigram, POS tag bigram {input utterance vector}” and “Dependency-relationship bigrams {previous speech act vector}” through a dependency parser, thus reflecting similarity between the speech-act of the previous utterance and the input utterance.; Yoo, ¶¶ p. 3082, col. 2 para 8 - p. 3083, col. 1, para. 1; FIGS. 2-3; Table 1).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the recurrent convolutional neural network of Kalchbrenner to incorporate the teachings of Yoo to include receiving an input utterance vector that is vectorized from information on at least one or more words forming an input utterance and a previous speech act vector that is vectorized from speech act information with respect to a previous utterance of the input utterance and generating an input utterance similarity vector that reflects similarity between the input utterance vector and the previous speech act vector. The disclosed speech act classification model shows increased accuracy over “competitive models in previous studies,” as recognized by Yoo. (Yoo, Abstract).

Regarding claim 8, the rejection of claim 6 is incorporated. Claim 8 is substantially the same as claim 3 and is therefore rejected under the same rationale as above.

Regarding claim 10, the rejection of claim 6 is incorporated. Claim 10 is substantially the same as claim 5 and is therefore rejected under the same rationale as above.

Claims 2, 4, 7, and 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kalchbrenner and Yoo as applied to claims 1 and 6 above, and further in view of Roblek (U.S. Pat. App. Pub. No. 2015/0294670, hereinafter Roblek).

Regarding claim 2, the rejection of claim 1 is incorporated. Kalchbrenner and Yoo disclose all of the elements of the current invention as stated above. However, Kalchbrenner and Yoo fail to expressly recite wherein the word similarity calculator calculates a similarity score between the input utterance vector and the previous speech act vector, and generates the input utterance similarity vector by using the similarity score.
Roblek teaches systems and methods for speaker verification. (Roblek, ¶ [0002]). Regarding claim 2, Roblek teaches wherein the word similarity calculator calculates a similarity score between the input utterance vector and the previous speech act vector (“A comparator 620 compares the evaluation vector 604 {input utterance vector} to the reference vector 404 {previous speech act vector} to verify the identity of the user {similarity}. In some implementations, the comparator 620 may generate a score indicating a likelihood that an utterance corresponds to an identity {similarity score}” where the evaluation vector is derived from a verification utterance {input utterance} and the reference vector is derived from an enrollment utterance {previous speech from the user}; Roblek, ¶¶ [0078], [0045]-[0046], [0067]), and generates the input utterance similarity vector by using the similarity score (“If the identity  Roblek, ¶¶ [0047]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the recurrent convolutional neural network of Kalchbrenner as modified by the speech act classification models of Yoo to incorporate the teachings of Roblek to include wherein the word similarity calculator calculates a similarity score between the input utterance vector and the previous speech act vector, and generates the input utterance similarity vector by using the similarity score. Comparison of current and previous speech can provide verification of user identity by voice, which can be used to prevent unauthorized access to voice responsive systems, as recognized by Roblek. (Roblek, ¶ [0003]-[0004]).

Regarding claim 4, the rejection of claim 1 is incorporated. Kalchbrenner and Yoo disclose all of the elements of the current invention as stated above. However, Kalchbrenner and Yoo fail to expressly recite wherein the conversation similarity calculator calculates a similarity score between the conversation unit input utterance vector and the speaker vector, and generates the conversation unit input utterance similarity vector by using the conversation unit input utterance vector and the similarity score.
The relevance of Roblek is described above with relation to claim 2. Regarding claim 4, Roblek teaches wherein the conversation similarity calculator calculates a similarity score between the conversation unit input utterance vector and the speaker vector (“A comparator 620 compares the evaluation vector 604 {conversation unit input utterance vector} to the reference vector 404 {speaker vector} to verify the identity of the user {similarity}. In some implementations, the comparator 620 may generate a score indicating a likelihood that an utterance corresponds to an identity {similarity score}” where the evaluation vector is derived from Roblek, ¶¶ [0078], [0045]-[0046], [0067]), and generates the conversation unit input utterance similarity vector by using the conversation unit input utterance vector and the similarity score (“If the identity of the user 102 is accepted, the client device 110 may perform” a desired function. Thus a similarity score and the evaluation vector {conversation unit input utterance vector} may be used to generate a conversation unit input utterance similarity vector.; Roblek, ¶¶ [0047]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the recurrent convolutional neural network of Kalchbrenner as modified by the speech act classification models of Yoo to incorporate the teachings of Roblek to include wherein the conversation similarity calculator calculates a similarity score between the conversation unit input utterance vector and the speaker vector, and generates the conversation unit input utterance similarity vector by using the conversation unit input utterance vector and the similarity score. Comparison of current and previous speech can provide verification of user identity by voice, which can be used to prevent unauthorized access to voice responsive systems, as recognized by Roblek. (Roblek, ¶ [0003]-[0004]).

Regarding claim 7, the rejection of claim 6 is incorporated. Claim 7 is substantially the same as claim 2 and is therefore rejected under the same rationale as above.

Regarding claim 9, the rejection of claim 6 is incorporated. Claim 9 is substantially the same as claim 4 and is therefore rejected under the same rationale as above.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Sean E Serraguard/Patent Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657