Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1,  6, 7, 14 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Holsapfel (US 20030149558 A1) in further view of Iwamasa  (US 20160275119 A1), Murthy (US 20180247107 A1) and Kamper (H. Kamper, W. Wang and K. Livescu, "Deep convolutional acoustic word embeddings using word-pair side information," 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 4950-4954, doi: 10.1109/ICASSP.2016.7472619.)
With respect to claims 1, 7 and 15 Holsapfel teaches  A method/An electronic device/ A computer program comprising a computer-readable recording medium comprising instructions that are executed of classifying a sentence into a class by using a deep neural network, the method comprising ([0057] The neural networks described above are realized as computer programs which run independently on a computer for converting the linguistic category of a text into prosodic markers thereof. They thus represent a method which can be executed automatically,  and [0058] The computer program can also be stored on an electronically readable data carrier and thus be transmitted to a different computer system.):
  training a first feature vector by using a first neural network and using a first sentence comprising one or more words, and a first class to which the first sentence belongs, as input data ([0047] The above-described architecture of a neural network with a plurality of models (in this case: the autoassociators) each trained to a specific class and a superordinate classifier makes it possible to reliably correctly map an input vector with a very large dimension onto an output vector with a small dimension or a scalar. This network architecture can also advantageously be used in other applications in which elements of different classes have to be dealt with. Thus, it may be expedient e.g. to use this network architecture also in speech recognition for the detection of word and/or sentence boundaries. The input data must be correspondingly adapted for this.); 
training a second feature vector by using a second neural network and using a second sentence and a second class to which the second sentence belongs ([0047] The above-described architecture of a neural network with a plurality of models (in this case: the autoassociators) each trained to a specific class and a superordinate classifier makes it possible to reliably correctly map an input vector with a very large dimension onto an output vector with a small dimension or a scalar. This network architecture can also advantageously be used in other applications in which elements of different classes have to be dealt with. Thus, it may be expedient e.g. to use this network architecture also in speech recognition for the detection of word and/or sentence boundaries. The input data must be correspondingly adapted for this.), as input data; 
Holsapfel does not explicitly teach but Iwamasa teaches [[obtaining a contrastive loss by]] quantifying a representational similarity between the first and second sentences based on the first feature vector, the second feature vector ([0038]The degree of similarity between the sentences D_i and D_j may be calculated as the inner product of feature vectors.), and [[information about whether the first class is the same as the second class]]; and 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Holsapfel in view of Iwamasa, in order to quantify a representational similarity between first and second sentences based on first and the second feature vector in order to search the database for first records based on an inquiry index including a value of at least one attributes as evidence by Iwamasa. (See Abstract);
 Neither Holsapfel nor Iwamasa explicitly disclose however Murthy teaches  obtaining a contrastive loss by quantifying a representational similarity and information about whether the first class is the same as the second class ([0061] The weighted contrastive loss function L.sub.m can be interpreted as a set of soft constraints which impose a significantly higher penalty for miss-classifying a sample to any class [information about whether the classes are the same] belonging to another cluster as compared to the penalty of miss-classifying to a class that belongs to the same cluster. In other words, minimizing the weighted contrastive loss results in a similarity metric of samples belonging to the same cluster to be small and samples across different clusters to be large)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Holsapfel and Iwamasa,  in view of Murthy, in order to obtain a contrastive loss by quantifying a representational similarity and information about whether the first class is the same as the second class to provide a method and system that improve classification results, as evidence by Murthy (See Par. 0020).
Holsapfel, Iwamasa and Murthy do not explicitly disclose  but Kamper teaches repeating the trainings using the first and second neural networks, in such a manner that the contrastive loss has a maximum value (p. 4591 Sec 2.3 para 3: ll Figure 1(b) illustrates how we apply this idea to obtain acoustic word embeddings. The two sides of our Siamese network [first and second neural network]take padded inputs Y1 and Y2. For the two sides we use CNNs similar to that of the word classification CNN., eq. 2 on page 4951 maximizes the loss function , p 4950 Sec 1 para 3: We show that a Siamese CNN trained with a hinge-like contrastive loss function outperforms the best approach of Levin et al. [3], and performs similarly to a word classifier CNN, despite the weaker form of supervision. By reducing the Siamese CNN embedding dimensionality with a post-processing linear discriminant analysis, we also obtain a more compact embedding that maintains best performance).   
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Holsapfel, Iwamasa and Murthy,  in view of Kamper, in order to repeat the trainings using the first and second neural networks, in such a manner that the contrastive loss has a maximum value to minimize the distance between word pairs of the same type relative to the distance between pairs of different types (p 4953 ll 5-6  Conclusion).

With respect to claims 6 and 14 Holsapfel, Iwamasa and Murthy do not explicitly disclose  but Kamper teaches wherein the training using the first neural network and the training using the second neural network are simultaneously performed (p 4950 Sec 1 para 3: We show that a Siamese CNN trained [training is simultaneously performed] with a hinge-like contrastive loss function outperforms the best approach of Levin et al. [3], and performs similarly to a word classifier CNN, despite the weaker form of supervision. By reducing the Siamese CNN embedding dimensionality with a post-processing linear discriminant analysis, we also obtain a more compact embedding that maintains best performance).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Holsapfel, Iwamasa and Murthy,  in view of Kamper, in order for the training using the first neural network and the training using the second neural network to be simultaneously performed to minimize the distance between word pairs of the same type relative to the distance between pairs of different types (p 4953 ll 5-6  Conclusion).

Claims 2, 8, and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Holsapfel Iwamasa, Murthy and Kamper as applied to claims 1, 7 and 8 respectively, in further view of Junqua (US 20040236778 A1) and He (US 20180293499 A1)
With respect to claims 2  Holsapfel, Iwamasa, Murthy and Kamper do not explicitly disclose  but Junqua teaches receiving an utterance input from a user ([0018] The speech recognizer 50 receives spoken input through a suitable microphone which may be incorporated into the remote control, into a hands free device placed on a nearby coffee table or the like, or into the storage device or television set. Output from the speech recognizer is supplied to a natural language parser 52.); 
recognizing the received utterance as a sentence ([0040] Automatic speech recognition process block 217 generates word confidence vector 268 which indicates how well the words in input sentence 218 were recognized.); 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Holsapfel, Iwamasa, Murthy and Kamper,  in view of Junqua, in order for recognizing the received utterance as a sentence to extracts semantically important and meaningful topics from a loosely structured, natural language text ([0030] Junqua).
Holsapfel, Iwamasa, Murthy, Kamper and Junqua do not explicitly disclose  but He teaches
extracting one or more words comprised in the recognized sentence, and transforming the one or more words into one or more word vectors ([0044]. The word embedding matrix E [sentence vector] was initialized with word vectors trained by Word2vec with negative sampling on each dataset, setting the embedding size to 200, window size to 5, and negative sample size to 5.), 
wherein the training of the first feature vector comprises: generating a sentence vector by arranging the one or more word vectors in a form of a matrix, and training the first feature vector by inputting the sentence vector to the first neural network as input data ([0044] For each corpus, punctuation symbols, stop words, and words appearing less than 10 times in the corpus were removed. The word embedding matrix E [sentence vector]was initialized with word vectors trained by Word2vec with negative sampling on each dataset, setting the embedding size to 200, window size to 5, and negative sample size to 5. The aspect embedding matrix T was initialized with centroids of clusters resulting from running k-means on word embeddings. Other parameters are initialized randomly. During the training process, the word embedding matrix E was fixed, and other parameters were optimized using stochastic optimization with a learning rate 0.001 for 15 epochs with batch size 50. The number of negative samples per input sample m was set to 20, and the orthogonality penalty weight λ was set to 1, and  [0042] Each word in the vocabulary is associated with a respective feature vector) 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Holsapfel, Iwamasa, Murthy, Kamper, and Junqua,  in view of He, for generating a sentence vector by arranging the one or more word vectors in a form of a matrix, and training the first feature vector by inputting the sentence vector to the first neural network as input data to further improve the coherence of aspects during training process ([0027] He)

With respect to claims 8  Holsapfel, Iwamasa, Murthy and Kamper do not explicitly disclose  but Junqua teaches further comprising an utterance inputter configured to receive an utterance input from a user, wherein the processor is further configured to recognize the received utterance as a sentence ([0040] Automatic speech recognition process block 217 generates word confidence vector 268 which indicates how well the words in input sentence 218 were recognized) , and 
extract one or more words comprised in the recognized sentence ([0040] Automatic speech recognition process block 217 generates word confidence vector 268 which indicates how well the words in input sentence 218 were recognized), 
Holsapfel, Iwamasa, Murthy, Kamper and Junqua do not explicitly disclose  but He teaches transform the one or more words into one or more word vectors ([0044] The word embedding matrix E [sentence vector]was initialized with word vectors trained by Word2vec with negative sampling on each dataset, setting the embedding size to 200, window size to 5, and negative sample size to 5. ) 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Holsapfel, Iwamasa, Murthy, Kamper, and Junqua,  in view of He, for generating a sentence vector by arranging the one or more word vectors in a form of a matrix, and training the first feature vector by inputting the sentence vector to the first neural network as input data to further improve the coherence of aspects during training process ([0027] He).

With respect to claim 9 Holsapfel, Iwamasa, Murthy, Kamper and Junqua do not explicitly disclose  but He teaches generate a sentence vector by arranging the one or more word vectors in a form of a matrix, and train the first feature vector by inputting the sentence vector to the first neural network as input data ([0044] For each corpus, punctuation symbols, stop words, and words appearing less than 10 times in the corpus were removed. The word embedding matrix E [sentence vector]was initialized with word vectors trained by Word2vec with negative sampling on each dataset, setting the embedding size to 200, window size to 5, and negative sample size to 5. The aspect embedding matrix T was initialized with centroids of clusters resulting from running k-means on word embeddings. Other parameters are initialized randomly. During the training process, the word embedding matrix E was fixed, and other parameters were optimized using stochastic optimization with a learning rate 0.001 for 15 epochs with batch size 50. The number of negative samples per input sample m was set to 20, and the orthogonality penalty weight λ was set to 1, and  [0042] Each word in the vocabulary is associated with a respective feature vector).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Holsapfel, Iwamasa, Murthy, Kamper, and Junqua,  in view of He, in order for generating a sentence vector by arranging the one or more word vectors in a form of a matrix, and training the first feature vector by inputting the sentence vector to the first neural network as input data to further improve the coherence of aspects during training process ([0027] He).

Claim 13 is  rejected under 35 U.S.C. 103 as being unpatentable over Holsapfel Iwamasa, Murthy and Kamper as applied to claims  7 in further view of  He and Xiong (US 20180329884 A1)
With respect to claim 13 Holsapfel, Iwamasa, Murthy, Kamper do not explicitly disclose  but He teaches wherein the processor is further configured to transform the first sentence into a matrix comprising one or more word vectors ([0044] For each corpus, punctuation symbols, stop words, and words appearing less than 10 times in the corpus were removed. The word embedding matrix E [sentence vector]was initialized with word vectors trained by Word2vec with negative sampling on each dataset, setting the embedding size to 200, window size to 5, and negative sample size to 5. The aspect embedding matrix T was initialized with centroids of clusters resulting from running k-means on word embeddings. Other parameters are initialized randomly. During the training process, the word embedding matrix E was fixed, and other parameters were optimized using stochastic optimization with a learning rate 0.001 for 15 epochs with batch size 50. The number of negative samples per input sample m was set to 20, and the orthogonality penalty weight λ was set to 1, and  [0042] Each word in the vocabulary is associated with a respective feature vector), 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Holsapfel, Iwamasa, Murthy and Kamper, and Junqua,  in view of He, in order for generating a sentence vector by arranging the one or more word vectors in a form of a matrix, and training the first feature vector by inputting the sentence vector to the first neural network as input data to further improve the coherence of aspects during training process ([0027] He).
 Holsapfel, Iwamasa, Murthy, Kamper and He  do not explicitly disclose  but Xiong teaches input the transformed matrix to a convolutional neural network (CNN) as input data, generate feature maps by applying a plurality of filters, and extract the first feature vector by passing the feature maps through a max pooling layer ([0088] The architecture builds the CNN 202 based on a sentence classifier. As shown in FIG. 3, the architecture provides a dynamic k-max pooling layer and chooses different hyper-parameters that fit the Chinese character-level learning. As illustrated in FIG. 3, the architecture of the CNN may receive a sentence representation, which then applies approaches to generate a fully connected layer, for example, by applying a convolutional layer with multiple filters, K-max pooling, a convolutional layer capturing sequential features, max over time pooling, etc.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Holsapfel, Iwamasa, Murthy and Kamper,  in view of Xiong, in order input the transformed matrix to a convolutional neural network (CNN) as input data, generate feature maps by applying a plurality of filters, and extract the first feature vector by passing the feature maps through a max pooling layer to  improve architecture wherein computing devices and components are specially configured and interoperate with one another in concert to provide the improved result. ([0030] Xiong).



Claims 3 and 10  are rejected under 35 U.S.C. 103 as being unpatentable over Holsapfel Iwamasa, Murthy and Kamper as applied to claims 1 and 7 respectively, in further view of Wu (US 20180174020 A1)
With respect to claims 3 and 10, Holsapfel, Iwamasa,  Murthy and Kamper do not explicitly disclose  but Wu teaches wherein a plurality of sentences and a plurality of classes to which the plurality of sentences belong are stored in a database ( Claim 20. The system of claim 19, wherein the at least one processor is operative to: receive an answer from the user in reply to the result response; analyze the answer to determine user feedback for the result response, wherein determine the one or more context sentences for the query based at least on the query is performed utilizing a context summary system, wherein assign the emotion label to each sentence in the one or more context sentences to form labeled sentences is performed utilizing a sentiment system, and wherein select the result response from the response database based on the labeled sentences [plurality of sentences that are labeled to attach a plurality of classes]is performed utilizing a response prediction system; and train the context summary system, the sentiment system, and the response prediction system based on the user feedback.), and 
wherein the second sentence and the second class are randomly extracted from the database ([0004]The collection includes the query and at least one previously received query. The selecting the result response comprises assigning a relevancy score to each response in the response database based on the query and the labeled sentences [sentences that are labeled to attach a classes], selecting a predetermined number of responses from the response database based on highest relevancy scores, and randomly selecting the result response from the predetermined number of responses). 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Holsapfel, Iwamasa, Murthy and Kamper,  in view of Wu, in order for a plurality of sentences and a plurality of classes to which the plurality of sentences belong being  stored in a database to provide data to and/or receive data from the client computing device  through the network ([0073] Wu).
Allowable Subject Matter
Claims 4, 5, 11 and 12 are objected to as being dependent upon a rejected base claims, but would be allowable if rewritten in independent form including all the limitations of the base claim and any intervening claims.
Claims 4 and 11 recite “wherein the obtaining of the contrastive loss comprises calculating the contrastive loss by using a dot product of the first and second feature vectors and an equation for indicating, as a number, the information about whether the first class is the same as the second class”. The closest teaching comes from Asl (US 20170076152 A1) who teaches “[0027] Experiments have shown that using a single loss function in the output layer of a Siamese network does not reliably capture similarities between long handwritten text strings. The performance of contrastive loss L is dependent on feature extraction of the hidden layers, where it should capture the similarities in a hierarchical way, to enable the output layer to extract features which can clearly represent the similarities of long and complex text strings.” Neither Asl  nor other cited references use a dot product of feature vectors or show an equation that shows information about similarity of classes. 
Claims 5 depends on 4, and 12 depends on 11. These claims are allowed over the prior art of record by virtue of their dependencies.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ATHAR N PASHA whose telephone number is (408)918-7675.  The examiner can normally be reached on Monday-Thursday Alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.   Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ATHAR N PASHA/Examiner, Art Unit 2657    

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657