DETAILED ACTION

Introduction
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
2.  	Claim 19 is objected to because of the following informalities: typographical errors. Claim 19 ends with a semicolon (;). This semicolon should be changed to a period (.). 

Claim Rejections - 35 USC § 103
3.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

4.	Claims 1-10, 12-20 are rejected under 35 U.S.C.103 as being unpatentable over 
Rezazadeh Sereshkeh et al. (US 2020/0285353 A1) in view of Agarwal et al. (US 2019/0080225 A1.)

	With respect to Claim 1, Rezazadeh Sereshkeh et al. disclose 
 	Apparatus for interactive voice recognition, the apparatus comprising: 
 	a canonical phrase derivation engine configured to derive canonical phrases from voice data (Rezazadeh Sereshkeh et al. [0091] the operation 205 includes inquiring the user to verify whether the user utterance relates to (is the same or similar to) a task of a canonical utterance of the existing cluster); 
 	an input engine configured to parse utterances (Rezazadeh Sereshkeh et al. [0066] The dependency parse representations of a sentence provides a grammatical relationship between each pair of words in the user utterance. For example, in the utterance “Get me the closest Italian restaurants,” the dependency parse representation indicates an adjective modifier dependency from "Italian" to "restaurants." Analogously, in another utterance such as "Find nearest Chinese restaurants", the dependency parse indicates the same relationship from "Chinese" to "restaurants."); 
 	a knowledge extraction engine configured to: 
 	 	disambiguate the utterances into words (Rezazadeh Sereshkeh et al. [0066] the parameter prediction module 125 uses multiple linguistic cues such as ... part of speech); 
 	 	form a sequence from the words (Rezazadeh Sereshkeh et al. Fig. 3A elements 305, 310 Encode user utterance, Obtain similarity s of encoded user utterance and existing cluster centroid, Fig. 3B elements 350 user utterances); 
 		pair the sequence with a phrase of the canonical phrases (Rezazadeh Sereshkeh et al. [0066] in another utterance such as "Find nearest Chinese restaurants", the dependency parse indicates the same relationship from "Chinese" to "restaurants." This dependency similarity between words in two different utterances is leveraged, and known parameters in a canonical utterance are matched to predict parameters in a new utterance, Fig. 3A elements 330 and 320 User utterance related to canonical utterance, Yes, Assign user utterance to existing cluster); 
 		merge the sequence and the phrase to form a hybrid phrase (Rezazadeh Sereshkeh et al. Fig. 3B, [0094] As shown in Fig. 3B, user utterances 350 may be respectively assigned to cluster 1 355, cluster 2 360); and 
 	 	determine an intent corresponding to the utterances (Rezazadeh Sereshkeh et al. [0091] for the user’s utterance “Find nearest Chinese restaurants,” the user may be asked, “Did you mean a task similar to: ‘Get me the closest Italian restaurants’? Based on the user utterance being verified to relate to the task of the canonical utterance, the operation 205 continues in the operation 320, [0089] In the operation 320, the operation 205 includes assigning the user utterance to the existing cluster, and determining that the user utterance refers to an existing automation script corresponding to the existing cluster).  
	Rezazadeh Sereshkeh et al. fail to explicitly teach
 	 	extract context from the sequence; 
 	 	vectorize the hybrid phrase into a vector; and 
 	a non-linear classification engine configured to: 

 		feed output from the embedding layer into a bidirectional long short-term memory layer; 
 		feed output from the bidirectional long short-term memory layer into a decision layer; and 
 		However, Agarwal et al. teach
 		extract context from the sequence (Agarwal et al. [0035] the vector representation of each word is inputted in at least one of a forward order and a reverse order as a result at every word in the query it retains the context of other words both on left and right hand side); 
 		vectorize the hybrid phrase into a vector (Agarwal et al. [0007] representing in the embedding layer of the common base network, the one or more user queries as a sequence of vector representation of each word learnt using word to vector model, wherein the sequence of words is replaced by corresponding vectors and the corresponding vectors are initiated using the word to vector model); and 
 	a non-linear classification engine configured to: 
 		embed the vector into a classifier embedding layer (Agarwal et al. Fig. 2 element 204 Representing in an embedding layer of a common base network, the one or more user queries as a sequence of vector representation of each word learnt using a word to vector model); 
 		feed output from the embedding layer into a bidirectional long short-term memory layer (Agarwal et al. Fig. 2 element 206 Inputting, to a single BiLSTM layer of the common base network, the sequence to vector representation of each word to generate‘t’ hidden states at every timestep, wherein the vector representation of each word is inputted in at least one of a forward order and a reverse order); 
 		feed output from the bidirectional long short-term memory layer into a decision layer (Agarwal et al. Fig. 2 elements 208, 210, and 212 Obtaining, using a maxpool layer of a classification model, dimension-wise maximum value of the sequence of vector to form a final vector, Determining by using a softmax layer of the classification model, at least one target class of the one or more queries based on the final vector); and 
Rezazadeh Sereshkeh et al. and Agarwal et al. are analogous art because they are from a similar field of endeavor in the Signal recognition techniques and applications. Thus, it would Obtaining, using a maxpool layer of a classification model, dimension-wise maximum value of the sequence of vector to form a final vector, Determining by using a softmax layer of the classification model, at least one target class of the one or more queries based on the final vector.)

	With respect to Claim 2, Rezazadeh Sereshkeh et al. in view of Agarwal et al. teach 
 	wherein the knowledge extraction engine is configured to generate a language dimension matrix for the words (Rezazadeh Sereshkeh et al. [0045] the parameter may be expressed in a matrix form, [0051] the utterance clustering module 105 may determine whether the textual user utterance is in a same cluster of user utterances as that of one or more prior user utterances. The same cluster includes the user utterances that are variations of the same task or command with the same or similar parameters or words.)

 	With respect to Claim 3, Rezazadeh Sereshkeh et al. in view of Agarwal et al. teach 
 	when the language dimension matrix is a first dimension matrix, the knowledge extraction engine is further configured to generate for each of the canonical phrases a second language dimension matrix (Rezazadeh Sereshkeh et al. [0066] This dependency similarity between words in two different utterances is leveraged, and known parameters in canonical utterance are matched to predict parameters in a new utterance.)

With respect to Claim 4, Rezazadeh Sereshkeh et al. in view of Agarwal et al. teach  
wherein the knowledge extraction engine is configured to: 
 	apply a linear model to identify the second language dimension matrix that is most similar to the first language dimension matrix (Rezazadeh Sereshkeh et al. [0066] in another utterance such as "Find nearest Chinese restaurants", the dependency parse indicates the same relationship from "Chinese" to "restaurants." This dependency similarity between words in two different utterances is leveraged, and known parameters in a canonical utterance are matched to predict parameters in a new utterance, Fig. 3A elements 330 and 320 User utterance related to canonical utterance, Yes, Assign user utterance to existing cluster); and 
 	select a phrase that corresponds to a most similar second language dimension matrix (Rezazadeh Sereshkeh et al. Fig. 3B elements 350, 355, 360.)

With respect to Claim 5, Rezazadeh Sereshkeh et al. in view of Agarwal et al. teach  
 	wherein the knowledge extraction engine is further configured to map a word of the sequence to an element of the phrase (Rezazadeh Sereshkeh et al. Fig. 3B elements 350, 355, 360.)

With respect to Claim 6, Rezazadeh Sereshkeh et al. in view of Agarwal et al. teach  
 	wherein the knowledge extraction engine is further configured to select for the hybrid phrase, from a word of the sequence and an element of phrase, either the word or the element (Rezazadeh Sereshkeh et al. Fig. 3B elements 350, 355, 360.)

With respect to Claim 7, Rezazadeh Sereshkeh et al. in view of Agarwal et al. teach  
 	wherein the knowledge extraction engine is further configured to vectorize the hybrid phrase as input for the non-linear classification engine (Agarwal et al. [0007] representing in the embedding layer of the common base network, the one or more user queries as a sequence of vector representation of each word learnt using word to vector model, wherein the sequence of words is replaced by corresponding vectors and the corresponding vectors are initiated using the word to vector model).

With respect to Claim 8, Rezazadeh Sereshkeh et al. disclose
 	An interactive voice recognition method comprising: 
 	deriving canonical phrases from voice data (Rezazadeh Sereshkeh et al. [0091] the operation 205 includes inquiring the user to verify whether the user utterance relates to (is the same or similar to) a task of a canonical utterance of the existing cluster);
 	digitally parsing utterances (Rezazadeh Sereshkeh et al. [0066] The dependency parse representations of a sentence provides a grammatical relationship between each pair of words in the user utterance. For example, in the utterance “Get me the closest Italian restaurants,” the dependency parse representation indicates an adjective modifier dependency from "Italian" to "restaurants." Analogously, in another utterance such as "Find nearest Chinese restaurants", the dependency parse indicates the same relationship from "Chinese" to "restaurants."); 
 	disambiguating the utterances into words (Rezazadeh Sereshkeh et al. [0066] the parameter prediction module 125 uses multiple linguistic cues such as ... part of speech); 
 	forming a sequence from the words (Rezazadeh Sereshkeh et al. Fig. 3A elements 305, 310 Encode user utterance, Obtain similarity s of encoded user utterance and existing cluster centroid, Fig. 3B elements 350 user utterances); 
	pairing the sequence with a phrase of the canonical phrases (Rezazadeh Sereshkeh et al. [0066] in another utterance such as "Find nearest Chinese restaurants", the dependency parse indicates the same relationship from "Chinese" to "restaurants." This dependency similarity between words in two different utterances is leveraged, and known parameters in a canonical utterance are matched to predict parameters in a new utterance, Fig. 3A elements 330 and 320 User utterance related to canonical utterance, Yes, Assign user utterance to existing cluster);
 	merging the sequence and the phrase to form a hybrid phrase (Rezazadeh Sereshkeh et al. Fig. 3B, [0094] As shown in Fig. 3B, user utterances 350 may be respectively assigned to cluster 355, cluster 2 360. This limitation is interpreted in light of paragraph [039] of the specification. The paragraph [039] of the specification disclose “The method may include selecting for the hybrid phrase, from a word of the sequence and an element of phrase, either the word or the element); 
 	determining an intent corresponding to the utterances (Rezazadeh Sereshkeh et al. [0091] for the user’s utterance “Find nearest Chinese restaurants,” the user may be asked, “Did you mean a task similar to: ‘Get me the closest Italian restaurants’? Based on the user utterance being verified to relate to the task of the canonical utterance, the operation 205 continues in the operation 320, [0089] In the operation 320, the operation 205 includes assigning the user utterance to the existing cluster, and determining that the user utterance refers to an existing automation script corresponding to the existing cluster).  
 	Rezazadeh Sereshkeh et al. fail to explicitly teach
 	extracting context from the sequence; 
 	vectorizing the hybrid phrase into a vector; 
 	embedding the vector into a classifier embedding layer; 

 	feeding output from the bidirectional long short-term memory layer into a decision layer; and 
However, Agarwal et al. teach
 	extracting context from the sequence (Agarwal et al. [0035] the vector representation of each word is inputted in at least one of a forward order and a reverse order as a result at every word in the query it retains the context of other words both on left and right hand side); 
 	vectorizing the hybrid phrase into a vector (Agarwal et al. [0007] representing in the embedding layer of the common base network, the one or more user queries as a sequence of vector representation of each word learnt using word to vector model, wherein the sequence of words is replaced by corresponding vectors and the corresponding vectors are initiated using the word to vector model); 
 	embedding the vector into a classifier embedding layer (Agarwal et al. Fig. 2 element 204 Representing in an embedding layer of a common base network, the one or more user queries as a sequence of vector representation of each word learnt using a word to vector model); 
 	feeding output from the embedding layer into a bidirectional long short-term memory layer (Agarwal et al. Fig. 2 element 206 Inputting, to a single BiLSTM layer of the common base network, the sequence to vector representation of each word to generate‘t’ hidden states at every timestep, wherein the vector representation of each word is inputted in at least one of a forward order and a reverse order); 
 	feeding output from the bidirectional long short-term memory layer into a decision layer (Agarwal et al. Fig. 2 elements 208, 210, and 212 Obtaining, using a maxpool layer of a classification model, dimension-wise maximum value of the sequence of vector to form a final vector, Determining by using a softmax layer of the classification model, at least one target class of the one or more queries based on the final vector); and 
 Rezazadeh Sereshkeh et al. and Agarwal et al. are analogous art because they are from a similar field of endeavor in the Signal recognition techniques and applications. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of matching the user’s utterance with a canonical utterance as taught by Rezazadeh Sereshkeh et al., using teaching of bidirectional long short-term memory Obtaining, using a maxpool layer of a classification model, dimension-wise maximum value of the sequence of vector to form a final vector, Determining by using a softmax layer of the classification model, at least one target class of the one or more queries based on the final vector.)

 	With respect to Claim 9, Rezazadeh Sereshkeh et al. in view of Agarwal et al. teach 
 	wherein the disambiguation includes forming a language-dimension matrix corresponding to the utterances (Rezazadeh Sereshkeh et al. [0045] the parameter may be expressed in a matrix form, [0051] the utterance clustering module 105 may determine whether the textual user utterance is in a same cluster of user utterances as that of one or more prior user utterances. The same cluster includes the user utterances that are variations of the same task or command with the same or similar parameters or words.)

   	With respect to Claim 10, Rezazadeh Sereshkeh et al. in view of Agarwal et al. teach
 	wherein the matrix includes a part-of-language parameter (Rezazadeh Sereshkeh et al.  [0066] For predicting parameters, the parameter prediction module 125 uses multiple linguistic cues such as word lemmas.) 

 	With respect to Claim 12, Rezazadeh Sereshkeh et al. in view of Agarwal et al. teach
 	wherein the matrix includes a coordinating term parameter (Rezazadeh Sereshkeh et al. [0066] the parameter prediction module 125 uses multiple linguistic cues such as ... part of speech. The paragraph [069] of the specification disclose “Coordinates such as 610,612 and 614 may be terms or parts of speech).  

With respect to Claim 13, Rezazadeh Sereshkeh et al. in view of Agarwal et al. teach 
 	wherein the extracting includes a products and services tree (Rezazadeh Sereshkeh et al. Fig. 3B Get me a cab to Central Park, Book tickets from Toronto to NYC.)

 	With respect to Claim 14, Rezazadeh Sereshkeh et al. in view of Agarwal et al. teach
for the user’s utterance “Find nearest Chinese restaurants,” the user may be asked, “Did you mean a task similar to: ‘Get me the closest Italian restaurants’? Based on the user utterance being verified to relate to the task of the canonical utterance, the operation 205 continues in the operation 320.)

With respect to Claim 15, Rezazadeh Sereshkeh et al. in view of Agarwal et al. teach    
 	wherein the pairing includes generating a language dimension matrix for the words (Rezazadeh Sereshkeh et al. [0045] the parameter may be expressed in a matrix form, [0051] the utterance clustering module 105 may determine whether the textual user utterance is in a same cluster of user utterances as that of one or more prior user utterances. The same cluster includes the user utterances that are variations of the same task or command with the same or similar parameters or words.) 

With respect to Claim 16, Rezazadeh Sereshkeh et al. in view of Agarwal et al. teach    
 	wherein, when the language dimension matrix is a first dimension matrix, the pairing further includes generating for each of the canonical phrases a second language dimension matrix (Rezazadeh Sereshkeh et al. [0066] This dependency similarity between words in two different utterances is leveraged, and known parameters in canonical utterance are matched to predict parameters in a new utterance.)

 	With respect to Claim 17, Rezazadeh Sereshkeh et al. in view of Agarwal et al. teach  
 	wherein; 
 	the pairing includes using a linear model to identify the second language dimension matrix that is most similar to the first language dimension matrix (Rezazadeh Sereshkeh et al. [0066] in another utterance such as "Find nearest Chinese restaurants", the dependency parse indicates the same relationship from "Chinese" to "restaurants." This dependency similarity between words in two different utterances is leveraged, and known parameters in a canonical utterance are matched to predict parameters in a new utterance, Fig. 3A elements 330 and 320 User utterance related to canonical utterance, Yes, Assign user utterance to existing cluster); and 


 	With respect to Claim 18, Rezazadeh Sereshkeh et al. in view of Agarwal et al. teach  
 	wherein the merging includes mapping a word of the sequence to an element of the phrase (Rezazadeh Sereshkeh et al. Fig. 3B elements 350, 355, 360.)

 	With respect to Claim 19, Rezazadeh Sereshkeh et al. in view of Agarwal et al. teach  
 	further comprising selecting for the hybrid phrase, from a word of the sequence and an element of phrase, either the word or the element (Rezazadeh Sereshkeh et al. Fig. 3B elements 350, 355, 360.);  

With respect to Claim 20, Rezazadeh Sereshkeh et al. in view of Agarwal et al. teach   
20. The method of claim 18 further comprising vectorizing the hybrid phrase (Agarwal et al. [0007] representing in the embedding layer of the common base network, the one or more user queries as a sequence of vector representation of each word learnt using word to vector model, wherein the sequence of words is replaced by corresponding vectors and the corresponding vectors are initiated using the word to vector model).

5.	Claim 11 is rejected under 35 U.S.C.103 as being unpatentable over 
Rezazadeh Sereshkeh et al. (US 2020/0285353 A1) in view of Agarwal et al. (US 2019/0080225 A1) and Kobayashi et al. (US 2021/0004543 A1.)

	With respect to Claim 11, Rezazadeh Sereshkeh et al. in view of Agarwal et al. teach all the limitations of Claim 9 upon which Claim 11 depends. Rezazadeh Sereshkeh et al. in view of Agarwal et al. fail to explicitly teach 
 	wherein the matrix includes a tense parameter.  
	However, Kobayashi et al. teach 
 	wherein the matrix includes a tense parameter (Kobayashi et al. [0020] The response sentence generating device according to the present invention can further include a speech-target identifying unit that determines, based on the analysis result of the speech sentence analyzed by the text analyzing unit, a target label indicating about whom the speech sentence is spoken or to whom the speech sentence is spoken, and the response-type determining unit can determine, based on at least any one of tense information.)
 	Rezazadeh Sereshkeh et al., Agarwal et al. and Kobayashi et al. are analogous art because they are from a similar field of endeavor in the Signal recognition techniques and applications. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of matching the user’s utterance with a canonical utterance as taught by Rezazadeh Sereshkeh et al., using teaching of bidirectional long short-term memory layer and softmax layer as taught by Agarwal et al. for the benefit of determining at least one target class of the query, using teaching of extracting the tense information as taught by Kobayashi et al. for the benefit of determining a response type with respect to the determined tense information (Kobayashi et al. [0020] The response sentence generating device according to the present invention can further include a speech-target identifying unit that determines, based on the analysis result of the speech sentence analyzed by the text analyzing unit, a target label indicating about whom the speech sentence is spoken or to whom the speech sentence is spoken, and the response-type determining unit can determine, based on at least any one of tense information.)

Conclusion
6.	The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. See PTO-892.
a.	Goslin et al. (US 2021/0081498 A1.) In this reference, Goslin et al. compares the user’s input to the canonical phrase to determine a mathematical distance between the user’s word choice and phrasing. 
b.	Reddi et al. (US 2020/0089758 A1.) In this reference, Reddi et al. disclose a method for converting the text into the dictionary or canonical form. 
c.	Rozen (US 2006/0100855 A1.) In this reference, Rozen disclose a method for mapping the word which correspond to their subject and verb to the to whom part of the canonical sound sequence.

7. 	Any inquiry concerning this communication or earlier communications from the examiner should be directed to THUYKHANH LE whose telephone number is (571)272-6429.  The examiner can normally be reached on Mon-Fri: 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C. Flanders can be reached 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/THUYKHANH LE/Primary Examiner, Art Unit 2655