Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Allowable Subject Matter

Claims 1-20 are allowed over the prior art of record.

The following is an examiner’s statement of reasons for allowance:
As per the independent claims, the prior art of record does not explicitly teach “generating a plurality of word representation matrices for an input sentence using an attention network comprising a plurality of attention heads corresponding to the plurality of syntactic categories, wherein each of the word representation matrices is generated by a different attention head of the plurality of attention heads based on a corresponding query vector of the plurality of query vectors.”, is not explicitly taught by the prior art of record.  Ravi et al (20200265196) teaches generating natural language representations (Abstract), associating ground truth context words with input words ([0010]), using classifiers to determine query, sentiment, and context ([0166], [0046-0047], [0068-0069]), and labeling syntax, noun, or verb phrases ([0208], [0211]); further disclosing classifying an input text by using text and/or intermediate features derived from the input text, and outputting a class label for the input text. ([0045-0047]). Also, Ravi [0068-0069] disclose using the text and/or intermediate features to classify segments of the input text, outputting multiple class labels corresponding to the segments; and [0208-0211] disclose examples of the outputs. In addition, Ravi Figure 7 discloses a neural projection model comprising a component using attention mechanism. In Figure 7, a context window is used to predict context words surrounding words in documents. Ravi [0247] further discloses that the window size is randomly sampled from the set 1-N. Accordingly, In Ravi, for one input sentence, multiple attention heads are used, each attention head being applied to a segment of the input sentence, and the output sentence has multiple labeled segments. Ravi thus teaches generating one word representation matrix (i.e., a representation of the output sentence having multiple labeled segments) for an input sentence, each of the multiple labeled segments generated by a different attention head, and each of the multiple labeled segments (or each label) corresponding to a syntactic category.  Ravi does not teach generating a plurality of word representation matrices for an input sentence, each of the word representation matrices being generated by a different attention head corresponding to a syntactic category as claimed in claim 1. First, each attention head in claim 1 is not applied on a segment of the sentence. Rather, each attention head is applied on the entire input sentence to generate one representation of the sentence; and a plurality of attention heads are used to generate a plurality of word representation matrices. This is illustrated in Figure 9 of the present application. In Figure 9, the output of using each of the attention heads is one representation of the entire sentence, not one segment of the sentence. In contrast, Ravi only teaches using multiple attention heads to generate one output sentence having multiple labeled segments. In Ravi, each of the multiple attention heads is used to label a segment of the input sentence, and the size of a segment is limited by the window size in Ravi Figure 7. Therefore, in Ravi, the attention head only covers a limited range of words surrounding the target words, and multiple attention heads are used to generate one representation of an input sentence (emphasis added), where each attention head is used to generate context words (or labels) for a word or a group of words (i.e., one segment). Accordingly, Ravi only teaches generating one word representation, i.e., a representation of the output sentence having multiple labeled segments, not a plurality of word representation matrices for an input sentence.
Second, Ravi does not teach that each of the word representation matrices is generated by a different attention head corresponding to a syntactic category. Ravi only discloses using classification heads ([0038]). Ravi does not disclose the classification heads are corresponding to syntactic categories. Instead, Ravi discloses generating an output sentence having multiple labeled segments, each of the multiple labeled segments (or each label) corresponding to a syntactic category. However, the multiple labeled segments (or the labels) being corresponding to syntactic categories is different from the attention heads being corresponding to syntactic categories (emphasis added). Therefore, Ravi does not teach “generating a plurality of word representation matrices for an input sentence..., wherein each of the word representation matrices is generated by a different attention head of the plurality of attention...” as recited in claim 1.  Furthermore, Ravi does not teach “identifying at least one word from the input sentence based on the plurality of word representation matrices” as recited in claim 1. That is, in Ravi, the output, 1.e., the output sentence having multiple labeled segments, is not further used to identify a label or a word corresponding to a syntactic category. The labels in Ravi are identified/generated before the output sentences are generated. In contrast, in claim 1, as discussed, first, attention heads (rather than labels) corresponding to syntactic categories are used to generate a plurality of word representation matrices; and after that, the “at least one word from the input sentence corresponding to a syntactic category” (i.e., the label) is identified based on the plurality of word representation matrices.  Ravi discloses having the labels or the generated multiple labeled segments corresponding to syntactic categories. Regarding the input text, Ravi discloses using text and/or intermediate features derived from the input text to label the input text ([0045-0047]), as well as to label the multiple segments of the input text ([0068-0069]).  However, Ravi does not teach identifying a query matrix comprising a plurality of query vectors, each of the query vectors corresponding to a different syntactic category. That is, Ravi only teaches identifies a label. Ravi thus may teach identifies one query vector corresponding to a syntactic category for an input text. However, Ravi does not teach identifying such a matrix representation for a query because Ravi is silent as to identifying a plurality of query vectors, each of the query vectors corresponds to a different syntactic category, for one input text, 1.e., a query.  Bennett (20100005081) teaches noun/verb phrase recognition in speech query processing ([0288]) and generating vectors or matrices representing a user’s speech query ([0321]-[0325]).  Bennett further teaches identifying word phrases present in a user query based on natural language routines (abstract).  Bennett discloses using noun phrase only as the metric to generate tokens for a speech query ([0288]); and the word phrase disclosed by Bennett has basic structure including three parts ([0288]). Accordingly, Bennett discloses converting the user’s text-based speech query into a structured, combined query, and using the noun phrase which has a pre-determined structure as the metric to generate tokens. That is, Bennett discloses generating multiple labels, and using the multiple labels to generate a structured, combined query for an input query. Bennett only teaches using the noun phrase which has a pre-determined structure as the metric to generate labels. Bennett does not teach each of the word representation matrices is generated by a different attention head corresponding to a syntactic category. That is, Bennett only teaches generating labels using a neural network, where the labels are corresponding to syntactic categories. Bennett does not disclose the attention heads are corresponding to syntactic categories. Further, Bennet is silent as to using each of the attention heads to generate one representation matrix for each input speech query. Therefore, Bennett does not teach “generating a plurality of word representation matrices for an input sentence..., wherein each of the word representation matrices is generated by a different attention head of the plurality of attention...” as recited in claim 1.  DeFelice (20200151396) teaches extracting noun and verb phrases (citing [0099]) from a text, driving word context ([0047]), and generating projection/cosine functions using reduction/compression ([0065-0066]).  DeFelice is generally directed to generating words or phrases that are “pinned” in the output, therefor reducing the variability of the generated text so as to preserve required information content (DeFelice, Abstract). Regarding how to generate the “pinned” words, DeFelice [0014-0015] and Figure 4a disclose using a parse tree associated with a component to tag the words according to the part of speech and identify particular noun phrases within the input. (DeFelice, [0061], and Figures 4a-4d). Also, DeFelice [0044-0045] disclose using classifiers including the Gaussian classifiers.  DeFelice does not teach generating a plurality of word representation matrices for an input sentence, each of the word representation matrices being generated by a different attention head corresponding to a syntactic category as claimed in claim 1. DeFelice only discloses using a parse tree to generate the “pinned” words that can represent words in the input text. Such “pinned” words are not word representation matrices; and for each input sentence, DeFelice is silent as to generating multiple combinations of “pinned” words, where each of the multiple combinations represents the input sentence. Further, DeFelice is silent regarding using multiple attention heads corresponding to multiple syntactic categories, and each of the word representation matrices generated by a different attention head corresponding to a syntactic category. Therefore, DeFelice does not teach “generating a plurality of word representation matrices for an input sentence..., wherein each of the word representation matrices is generated by a different attention head of the plurality of attention...” as recited in claim 1.  Furthermore, it would not have been obvious to one of ordinary skill in the art to modify the teachings of the prior art of record to obtain the recited claim limitations noted above.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Opsasnick, telephone number (571)272-7623, who is available Monday-Friday, 9am-5pm. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Mr. Richemond Dorvil, can be reached at (571)272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/Michael N Opsasnick/Primary Examiner, Art Unit 2658                                                                                                                                                                                                        09/06/2022