DETAILED ACTION

Introduction
1.         This office action is in response to Applicant’s submission filed on 07/25/2022.   Claims 1-20 are pending in the application. As such, Claims 1-20 have been examined.

Notice of Pre-AIA  or AIA  Status
The present application is being examined under the pre-AIA  first to invent provisions. 

Response to Amendment

The response filed on 07/25/2022 has been correspondingly accepted and considered in this Office Action.  Claims 1-20 have been examined.  Applicant’s amendments and remarks with respect to Claims 1, 5, 7, 9-11, 13-16, and 18-20 have been fully reconsidered.  In response, Examiner respectfully presents that the previous objections to specification, and further rejections under 35 U.S.C. §§101, 102 and 103 has been lifted.
However, in light of the amendments to the claims, new grounds for rejection for claims 1-20 under 35 U.S.C 103 are provided in the response below.


Response to Arguments
4.	With respect to rejections of Claims 1, 9-11, 15-16 and 20 under 35 U.S.C 102(a)(1) as being anticipated by U.S. Patent Application Publication No. 2018/0329883 (Leidner et al.) hereinafter as Leidner already of record, Applicant appears to present the following position on Arguments pp. 22-25, dated 07-25-2022.
“These rejections are respectfully traversed since the documents cited in the above rejections do not anticipate nor render obvious each and every feature of the claims. However, in order to expedite prosecution of the subject application, independent claim 1 has been amended, and recites the features of:
generating, via a processor, keyword vectors representing a context for the keywords based on word embeddings for the keywords, wherein the keywords are syntactically unordered and associated with language tags, and wherein the template includes a series of language tags indicating an arrangement for words of the generated natural language content;
generating, via the processor, template vectors based on word embeddings for the series of language tags of the template, wherein the template vectors represent a context for the template; determining, via the processor, contributions from the contexts for the keywords and the template based on a comparison of the word embeddings of the series of language tags of the template with word embeddings of the associated language tags of the keywords; and generating, via a machine learning model of the processor, one or more words for each language tag of the template from a word vocabulary to produce the natural language content based on combined contributions from the contexts for the keywords and the template, wherein the machine learning model includes a recurrent neural network and the word vocabulary is learned from training data during training of the machine learning model. 
Independent claims 11 and 16 have also been amended in order to expedite prosecution of the subject application, and recite similar features. Support for these features may be found throughout the specification (e.g., See Paragraphs 0016, 0059, 0060, 0062, 0081, 0100, and 0103 of the published version of the subject application (U.S. Patent Application Publication No. 2021/0342552)).
The remaining documents cited in the above rejections do not compensate for these deficiencies. Rather, the Brownlee publication discloses cross-entropy for machine learning, and is merely utilized by the Examiner for an alleged teaching of a complement of a predicted probability.
The Ren et al. patent discloses an image captioning system and method for generating a caption for an image, and is merely utilized by the Examiner for an alleged teaching of complete sentences without function words.
The Georges et al. publication discloses an apparatus for detecting intent in voiced audio, and is merely utilized by the Examiner for an alleged teaching of producing the same encoded vector representations for a set of keywords regardless of an order of the keywords.

Since the documents cited in the above rejections do not disclose, teach or suggest, either alone or in combination, the features recited within independent claims 1, 11, and 16 as discussed above, these independent claims are considered to be in condition for allowance.  Dependent claims 2 — 10, 12 — 15, and 17 - 20 depend, either directly or indirectly, from independent claims 1, 11, or 16 and, therefore, include all the limitations of their parent claims. Claims 5, 7, 9, 10, 13 - 15, and 18 - 20 have been amended for further clarification and/or consistency with their amended parent claims. The dependent claims are considered to be in condition for allowance for substantially the same reasons discussed above in relation to their parent claims, and/or for further limitations recited in the dependent claims.”
	
	In response, Examiner respectfully notes that the combination of Leidner et al.(US Patent Application Publication: US-20180329883-A1) hereinafter as Leidner already of record, in view of S. Agarwal et al. (Agarwal, V. Aggarwal, A. R. Akula, G. B. Dasgupta and G. Sridhara, "Automatic problem extraction and analysis from unstructured text in IT tickets," in IBM Journal of Research and Development, vol. 61, no. 1, pp. 4:41-4:52, 1 Jan.-Feb. 2017, doi: 10.1147/JRD.2016.2629318.) hereinafter as Agarwal, clearly and unambiguously discloses the amended language and presented arguments with respect to independent claim 1.  In the interest of presenting a complete response, Examiner notes that the Leidner discloses keyword vectors representing a context for the keywords based on word embeddings for the keywords, Leidner teaches input words in forming a series of vectors and predicting the sequence of words related to context, “…wherein the encoder 402 is adapted to receive a sequence of vectors 416 representing a source sequence of words, and the decoder 404 is adapted to predict a probability of a target sequence of words representing a target output sentence 424 based on a recurrent state in the decoder, a set of previous words and a context vector.” See e.g., Leidner, para [0056]. 
Examiner also notes, the combination of Leidner in view of Agarwal teaches wherein the keywords are syntactically unordered and associated with language tags, in [sect 1] of Agarwal, they explain their disclosure regarding handling and processing a unstructured text, “The technical contributions of this paper include the following. Given a set of tickets containing noisy, unstructured, and incomplete text descriptions, we focus on automatically detecting the problem categories of tickets using context-aware natural language processing techniques.”  In section [sect 4.1] of Agarwal, applying language tag to the unstructured input is discussed; “Step 1 of phase 1 involves determining the most relevant POS patterns. The intuition behind this POS tag pattern analysis is derived from the work in [8]. It is typically a onetime process that should be performed on a dataset of reasonable size and quality. The patterns thus obtained can be re-used for subsequent analysis in the same domain.”  Also see Fig 2 and 4 regarding POS tagging.
Leidner further discloses: generating, via the processor, template vectors based on word embeddings for the series of language tags of the template, Leidner describes generating template vectors based on the tag embedded matrix, “…to the tag embedding matrix 556(and 558-560)/656 to generate tag vectors, ….” See e.g., Leidner, para [0057].  The language tags of template is discussed below with “POS tags predictions” in para[0068].
Leidner further discloses the following, based on a comparison of the word embeddings of the series of language tags of the template with word embeddings of the associated language tags of the keywords;	Leidner is describing the comparison of the keyword and the template that consist of the language tags, “Preferably, the neural model correctly predicts both words and POS tags. To enforce this we defined a custom objective function J.sub.t that jointly takes into account words and POS tags predictions. In this example, J.sub.t can be formulated as follows: ...”  See e.g., Leidner, equations 11-13, para[0068];
Leidner also teaches generating, via a machine learning model of the processor, one or more words for each language tag of the template from a word vocabulary to produce the natural language content based on combined contributions from the contexts for the keywords and the template, Leidner describe using an RNN encoder-decoder model, a word vocabulary, see details below, and also discuss the prediction of keyword or phrase using context from the POS tagger, “The main goal of the output decomposition component is to predict both words and POS tags from the output of the neural network.” See e.g., Leidner, paragraph [0064]).
Leidner also discuss wherein the machine learning model includes a recurrent neural network 
Here, in [Leidner, 0025] RNN machine learning model is discussed “In one exemplary manner, PPDB 2.0 (a large scale automatically mined data set of paraphrases) and COCO (an image captioning data set) were used to confirm use of the invention to extend RNN encoder-decoder models with POS tag information.”
Leidner further discusses: and the word vocabulary is learned from training data during training of the machine learning model.  In [Leidner, 0027], the word vocabulary learned from training is described in details,  “The neural paraphrase generator may be further characterized in one of more of the following: further comprising an attention module adapted to generate a custom context vector for each prediction based at least in part on an attention function; wherein the attention module is further adapted to generate an attentional vector by concatenating the decoder state and the context vector; wherein the attentional vector is passed through a softmax layer to produce a probability distribution over a word vocabulary data set; wherein the word embedding matrix and the tag embedding matrix are populated with pretrained values; wherein the structured tag element is a part-of-speech tag; further comprising a loss function adapted to learn to predict tuples of words and structured tags and comprising a custom objective function that jointly considers word and structured tag predictions; wherein the custom objective function Jt is formulated as follows:…”
For at least the supra provided reasons, Applicant’s arguments are found not persuasive.  Examiner respectfully disagrees, and therefore, the amendment necessitated rejections of Claims 1-4, 6, 8, 12 and 17 rejected under 35 U.S.C. 103 are applied and further updated accordingly.
In response to the art rejection(s) of the remainder of dependent claims 2-4, 6, and 8; 12 and 17 rejected under 35 U.S.C. 103 in case said claims are correspondingly discussed and/or argued for at least the same rationale presented in Remarks filed 07/25/2022, Examiner respectfully notes as follows. For completeness, should the mentioned claim(s) is(are) likewise traversed for similar reasons to independent claims 1; 11; and 16 correspondingly, Examiner respectfully directs Applicant to the same previous supra reasons provided in the response directed towards claims 1-4, 6, 8, 12 and 17 correspondingly discussed above.  For at least the same supra provided reasons, Examiner likewise respectfully disagrees, and as such, Applicant’s arguments are also found not persuasive. 

Regarding amended claim 5, Leidner in view of Agarwal, further in view of Yan et al. (CN 108073576 A) hereinafter as Yan, furthermore in view of Zhou (US Patent No: US-10467274-B1) already of record teaches the “second machine learning model is trained with a dataset”  See e.g., “… , by trained convolutional neural network user data of the user input such as complete sentences to semantic understanding, …” See Yan, pg 2, 4th para.

Regarding amended claim 13 and 18, although they are different in scope from claim 5 and each other, they recite elements of the method claim of 5, as a system and computer product respectively.  Thus, the analysis in rejecting claim 5 is equally applicable to claim 13 and 18.

Regarding amended claims 7, Leidner in view of Agarwal, and further in view of George (US Patent Application Publication No: US-20190027133-A1) already of record discloses all the amended limitations.  The minor and syntax improvement such as adding “the, the removal of “each of”, and the clarification of “for the keywords” and “corresponding set” does not overcome the previous rejection.  The incorporation of a “second” machine learning model is taught in the George disclosure, “For example, the models may include  models of feedforward neural networks (FNNs), recurrent neural networks (RNNs) such as long short-term memory (LSTM) networks, as well as non-neural networks, such as support vector machines (SVMs).” See e.g., George, [0058]).

Regarding amended claim 9, Leidner in view of Agarwal discloses the “series of “language tags of the template, also does not appear to be significant enough to overcome the previous rejection rationale, in Leidner, para 0020, “POS tags are labels assigned to words or tokens in a corpus, e.g., text corpus, that indicate a part of speech known to be related to the word or token. POS tags or labels may refer to grammatical elements or categories of words such as noun, verb, adjective, adverb, tense (present/past/future), number (plural/singular), etc. A POS tagset is a set of POS tags used to label words or tokens in the corpus. Tagsets may be language specific, e.g., see Appendix. POS tagging is a useful technical tool enabling text processing systems to automatically differentiate based on POS.” Also see para 0026, “…comprising a word embedding matrix and a tag embedding matrix, …”

Regarding claim 14 and 19, although they are different in scope from claims 7, 8, 9 and each other, they recite elements of the method claim of 7, 8 and 9 as a system and computer product respectively.  Thus, the analysis in rejecting claim 7, 8 and 9 is equally applicable to claim 14 and 19.
Regarding amended claim 10, removing specific elements of, “recurrent” and “wherein the word vocabulary is learned from the training data during training of the recurrent machine learning model” does not significantly affecting the previous cited rejection, as the previous prior art rejection rationale still applies. 

Regarding claims 15 and 20, although they are different in scope from claim 10 and each other, they recite elements of the method claim of 10, as a system and computer product respectively.  Thus, the analysis in rejecting claim 10 is equally applicable to claims 15 and 20.

Regarding the amended dependent claims 5, 7-10,  13-15, 18-20. For at least the same supra provided reasons, Examiner likewise respectfully disagrees, and as such, Applicant’s arguments are also found not persuasive. 


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1 is/are rejected under 35 U.S.C. 103 as being unpatentable over Leidner already of record, in view of Agarwal et al.( S. Agarwal, V. Aggarwal, A. R. Akula, G. B. Dasgupta and G. Sridhara, "Automatic problem extraction and analysis from unstructured text in IT tickets," in IBM Journal of Research and Development, vol. 61, no. 1, pp. 4:41-4:52, 1 Jan.-Feb. 2017, doi: 10.1147/JRD.2016.2629318.) hereinafter as Agarwal.
	Regarding claim 1, Leidner discloses: A method of generating natural language content from a set of keywords in accordance with a template comprising: generating, via a processor (“In a first embodiment the present invention provides a computer-based system comprising a processor in electrical communication with a memory, the memory adapted to store data and instructions for executing by the processor, and a neural paraphrase generator.” See e.g., Leidner, para [0026]),
	keyword vectors representing a context for the keywords based on word embeddings for the keywords (“The neural paraphrase generator comprises: an input adapted to receive a sequence of tuples (t=(t.sub.1, . . . , t.sub.n)) comprising a source sequence of words, each tuple (t.sub.i=(w.sub.i,p.sub.i)) comprising a word data element (w.sub.i) and a structured tag element (p.sub.i), the structured tag element representing a linguistic attribute about the word data element; a recurrent neural network (RNN) comprising an encoder and a decoder, wherein the encoder is adapted to receive a sequence of vectors representing a source sequence of words, and the decoder is adapted to predict a probability of a target sequence of words representing a target output sentence based on a recurrent state in the decoder, a set of previous words and a context vector; an input composition component connected to the input and comprising a word embedding matrix and a tag embedding matrix, the input composition component being adapted to receive and transform the input sequence of tuples into a sequence of vectors by” See e.g., Leidner, para [0026].),
and wherein the template includes a series of language tags indicating an arrangement for words of the generated natural language content (see e.g., “wherein the structured tag element is a part-of-speech tag; further comprising a loss function adapted to learn to predict tuples of words and structured tags and comprising a custom objective function that jointly considers word and structured tag predictions; wherein the custom objective function Jt is formulated as follows:… where D is the parallel training corpus of source tuples (i.e., t), composed of aligned sequences of source words (i.e., w) and source POS tags (i.e., p), and target tuples (i.e., {circumflex over (t)}), composed of aligned sequences of target words (i.e., ŵ) and target POS tags (i.e., {circumflex over (p)}); wherein the RNN includes Long Short Term Memory (LSTM) cells; and further comprising at least one additional linguistic attribute in addition to the structured tag” See e.g., Leidner, para [0027] also see the three equations within the same paragraph);
generating, via the processor, template vectors based on word embeddings for the series of language tags of the template, wherein the template vectors represent a context for the template (see e.g., “…to the tag embedding matrix 556(and 558-560)/656 to generate tag vectors, ...”  “…mapping the structured tag elements (524/526/528 (and 514/516/518); 624/626/628) to the tag embedding matrix 556)…” See e.g., Leidner, para [0057]);
determining, via the processor, contributions from the contexts for the keywords and the template (see e.g., “The input composition component allows the RNN to be fed with both words and POS tags. These two pieces of information flow through the network, and jointly affect both the encoder and the decoder state, as well as the attention context. The main goal of the output decomposition component is to predict both words and POS tags from the output of the neural network.” See e.g., Leidner, paragraph [0064])
based on a comparison of the word embeddings of the series of language tags of the template with word embeddings of the associated language tags of the keywords (see e.g., “Preferably, the neural model correctly predicts both words and POS tags. To enforce this we defined a custom objective function J.sub.t that jointly takes into account words and POS tags predictions. In this example, J.sub.t can be formulated as follows: ...”  See e.g., Leidner, equations 11-13, para[0068]);
and generating, via a machine learning model of the processor (“In one exemplary manner, PPDB 2.0 (a large scale automatically mined data set of paraphrases) and COCO (an image captioning data set) were used to confirm use of the invention to extend RNN encoder-decoder models with POS tag information.”  See e.g., Leidner, para [0025]),
one or more words for each language tag of the template from a word vocabulary to produce the natural language content based on combined contributions from the contexts for the keywords and the template (see e.g., “The main goal of the output decomposition component is to predict both words and POS tags from the output of the neural network.” See e.g., Leidner, paragraph [0064]),
wherein the machine learning model includes a recurrent neural network (“In one exemplary manner, PPDB 2.0 (a large scale automatically mined data set of paraphrases) and COCO (an image captioning data set) were used to confirm use of the invention to extend RNN encoder-decoder models with POS tag information.”  See e.g., Leidner, para [0025])
and the word vocabulary is learned from training data during training of the machine learning model (“The neural paraphrase generator may be further characterized in one of more of the following: further comprising an attention module adapted to generate a custom context vector for each prediction based at least in part on an attention function; wherein the attention module is further adapted to generate an attentional vector by concatenating the decoder state and the context vector; wherein the attentional vector is passed through a softmax layer to produce a probability distribution over a word vocabulary data set; wherein the word embedding matrix and the tag embedding matrix are populated with pretrained values; wherein the structured tag element is a part-of-speech tag; further comprising a loss function adapted to learn to predict tuples of words and structured tags and comprising a custom objective function that jointly considers word and structured tag predictions; wherein the custom objective function Jt is formulated as follows: …” See e.g., Leidner, para [0027]).
Leidner does not explicitly, but Agarwal discloses: wherein the keywords are syntactically unordered and associated with language tags ([sect 1] The technical contributions of this paper include the following. Given a set of tickets containing noisy, unstructured, and incomplete text descriptions, we focus on automatically detecting the problem categories of tickets using context-aware natural language processing techniques.  [sect 4.1] Step 1 of phase 1 involves determining the most relevant POS patterns. The intuition behind this POS tag pattern analysis is derived from the work in [8]. It is typically a onetime process that should be performed on a dataset of reasonable size and quality. The patterns thus obtained can be re-used for subsequent analysis in the same domain.  Also see Fig 2 and 4 regarding POS tagging.),
Leidner and Agarwal are considered analogous art because they are both in the related art of text generating.  Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify teaching of Leidner to combine the teaching of Agarwal to incorporate syntactically unordered keywords and associated language tags, because it would allow the machine model to more effectively process unstructured text input (Agarwal, sect 1, introduction).

Regarding claim 2, Leidner in view of Agarwal, discloses: The method of claim 1,
Leidner additionally discloses: wherein the language tags of the template and the associated language tags of the keywords include part-of-speech tags (see e.g., “The loss function we use for training the model is the sum of two terms taking into account the predicted words and POS tags, respectively. The proposed model learns to jointly generate a sequence of words and POS tags. In this way, the prediction of POS tags for the generated paraphrases informs the selection of words…” See e.g., Leidner, para[0024]).

Regarding claim 6, Leidner in view of Agarwal, discloses: The method of claim 1, 
Leidner additionally discloses: wherein the keywords (see e.g. “…use different words or syntax to express the same or similar mean…”) are in a first natural language (see e.g., “is an important part of a number of Natural Language Processing (NLP….” See e.g., Leidner, para[0006]),
and the generated natural language content is in a second different natural language (see e.g., “Mallinson, Sennrich and Lapata … Paraphrasing Revisited with Neural Machine Translation. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers (EACL '17). 881-893.) applied the bilingual pivoting approach proposed by Bannard and Callison-Burch … with neural machine translation, where the input sequence is mapped to a number of translations in different languages,” See e.g., Leidner, para [0017]).

Regarding claim 10, Leidner in view of Agarwal, discloses the method of claim 1, Leidner further discloses: wherein generating one or more words for each language tag of the template comprises: determining for each language tag of the template (see e.g., “…wherein the structured tag element is a part-of-speech tag; further comprising a loss function adapted to learn to predict tuples of words and structured tags and comprising a custom objective function that jointly considers word and structured tag predictions; wherein the custom objective function Jt is formulated as follows:… where D is the parallel training corpus of source tuples (i.e., t), composed of aligned sequences of source words (i.e., w) and source POS tags (i.e., p), and target tuples (i.e., {circumflex over (t)}), composed of aligned sequences of target words (i.e., ŵ) and target POS tags (i.e., {circumflex over (p)}); wherein the RNN includes Long Short Term Memory (LSTM) cells; and further comprising at least one additional linguistic attribute in addition to the structured tag” See e.g., Leidner, para [0027] also see the three equations within the same paragraph);
a probability distribution over the word vocabulary (see e.g., “The neural paraphrase generator may be further characterized in one of more of the following: further comprising an attention module adapted to generate a custom context vector for each prediction based at least in part on an attention function; wherein the attention module is further adapted to generate an attentional vector by concatenating the decoder state and the context vector; wherein the attentional vector is passed through a softmax layer to produce a probability distribution over a word vocabulary data set;” See e.g., Leidner, para [0027]),  
using the machine learning model (see e.g., FIG. 4 is a schematic diagram illustrating an exemplary RNN with encoder/decoder for implementing the neural paraphrase generator in accordance with the present invention.),
and selecting one or more words from the word vocabulary for a corresponding language tag of the template based on the probability distribution (see e.g., “This is usually done by iteratively selecting words from the probability distribution p(ŵ.sub.j|(ŵ.sub.1, . . . , wĴ.sub.−1),y.sub.j) using an output generation criterion, until an end of sentence (EOS) symbol is predicted.” See e.g., Leidner, para[0052]).  

Regarding claim 11, Leidner discloses: A system for generating natural language content from a set of keywords in accordance with a template comprising: a processor configured to: (“In a first embodiment the present invention provides a computer-based system comprising a processor in electrical communication with a memory, the memory adapted to store data and instructions for executing by the processor, and a neural paraphrase generator.” See e.g., Leidner, para [0026]),
	generate keyword vectors representing a context for the keywords based on word embeddings for the keywords (“The neural paraphrase generator comprises: an input adapted to receive a sequence of tuples (t=(t.sub.1, . . . , t.sub.n)) comprising a source sequence of words, each tuple (t.sub.i=(w.sub.i,p.sub.i)) comprising a word data element (w.sub.i) and a structured tag element (p.sub.i), the structured tag element representing a linguistic attribute about the word data element; a recurrent neural network (RNN) comprising an encoder and a decoder, wherein the encoder is adapted to receive a sequence of vectors representing a source sequence of words, and the decoder is adapted to predict a probability of a target sequence of words representing a target output sentence based on a recurrent state in the decoder, a set of previous words and a context vector; an input composition component connected to the input and comprising a word embedding matrix and a tag embedding matrix, the input composition component being adapted to receive and transform the input sequence of tuples into a sequence of vectors by” See e.g., Leidner, para [0026].),
and wherein the template includes a series of language tags indicating an arrangement for words of the generated natural language content (see e.g., “wherein the structured tag element is a part-of-speech tag; further comprising a loss function adapted to learn to predict tuples of words and structured tags and comprising a custom objective function that jointly considers word and structured tag predictions; wherein the custom objective function Jt is formulated as follows:… where D is the parallel training corpus of source tuples (i.e., t), composed of aligned sequences of source words (i.e., w) and source POS tags (i.e., p), and target tuples (i.e., {circumflex over (t)}), composed of aligned sequences of target words (i.e., ŵ) and target POS tags (i.e., {circumflex over (p)}); wherein the RNN includes Long Short Term Memory (LSTM) cells; and further comprising at least one additional linguistic attribute in addition to the structured tag” See e.g., Leidner, para [0027] also see the three equations within the same paragraph);
generate template vectors based on word embeddings for the series of language tags of the template, wherein the template vectors represent a context for the template (see e.g., “…to the tag embedding matrix 556(and 558-560)/656 to generate tag vectors, ...”  “…mapping the structured tag elements (524/526/528 (and 514/516/518); 624/626/628) to the tag embedding matrix 556)…” See e.g., Leidner, para [0057]);
determine contributions from the contexts for the keywords and the template (see e.g., “The input composition component allows the RNN to be fed with both words and POS tags. These two pieces of information flow through the network, and jointly affect both the encoder and the decoder state, as well as the attention context. The main goal of the output decomposition component is to predict both words and POS tags from the output of the neural network.” See e.g., Leidner, paragraph [0064])
based on a comparison of the word embeddings of the series of language tags of the template with word embeddings of the associated language tags of the keywords (see e.g., “Preferably, the neural model correctly predicts both words and POS tags. To enforce this we defined a custom objective function J.sub.t that jointly takes into account words and POS tags predictions. In this example, J.sub.t can be formulated as follows: ...”  See e.g., Leidner, equations 11-13, para[0068]);
and generating, via a machine learning model(“In one exemplary manner, PPDB 2.0 (a large scale automatically mined data set of paraphrases) and COCO (an image captioning data set) were used to confirm use of the invention to extend RNN encoder-decoder models with POS tag information.”  See e.g., Leidner, para [0025]),
one or more words for each language tag of the template from a word vocabulary to produce the natural language content based on combined contributions from the contexts for the keywords and the template (see e.g., “The main goal of the output decomposition component is to predict both words and POS tags from the output of the neural network.” See e.g., Leidner, paragraph [0064]),
wherein the machine learning model includes a recurrent neural network (“In one exemplary manner, PPDB 2.0 (a large scale automatically mined data set of paraphrases) and COCO (an image captioning data set) were used to confirm use of the invention to extend RNN encoder-decoder models with POS tag information.”  See e.g., Leidner, para [0025])
and the word vocabulary is learned from training data during training of the machine learning model (“The neural paraphrase generator may be further characterized in one of more of the following: further comprising an attention module adapted to generate a custom context vector for each prediction based at least in part on an attention function; wherein the attention module is further adapted to generate an attentional vector by concatenating the decoder state and the context vector; wherein the attentional vector is passed through a softmax layer to produce a probability distribution over a word vocabulary data set; wherein the word embedding matrix and the tag embedding matrix are populated with pretrained values; wherein the structured tag element is a part-of-speech tag; further comprising a loss function adapted to learn to predict tuples of words and structured tags and comprising a custom objective function that jointly considers word and structured tag predictions; wherein the custom objective function Jt is formulated as follows: …” See e.g., Leidner, para [0027]).
Leidner does not explicitly, but Agarwal discloses: wherein the keywords are syntactically unordered and associated with language tags ([sect 1] The technical contributions of this paper include the following. Given a set of tickets containing noisy, unstructured, and incomplete text descriptions, we focus on automatically detecting the problem categories of tickets using context-aware natural language processing techniques.  [sect 4.1] Step 1 of phase 1 involves determining the most relevant POS patterns. The intuition behind this POS tag pattern analysis is derived from the work in [8]. It is typically a onetime process that should be performed on a dataset of reasonable size and quality. The patterns thus obtained can be re-used for subsequent analysis in the same domain.  Also see Fig 2 and 4 regarding POS tagging.),
Leidner and Agarwal are considered analogous art because they are both in the related art of text generating.  Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify teaching of Leidner to combine the teaching of Agarwal to incorporate syntactically unordered keywords and associated language tags, because it would allow the machine model to more effectively process unstructured text input (Agarwal, sect 1, introduction).

Regarding claim 15,  they recite elements of the method claim of 10, as a system.  Thus, the analysis in rejecting claim 10 is equally applicable to claim 15.

Regarding claim 16, Leidner discloses: A computer program product for generating natural language content from a set of keywords in accordance with a template, the computer program product comprising one or more computer readable storage media having program instructions collectively stored on the one or more computer readable storage media (“In a first embodiment the present invention provides a computer-based system comprising a processor in electrical communication with a memory, the memory adapted to store data and instructions for executing by the processor, and a neural paraphrase generator.” See e.g., Leidner, para [0026]),
	the program instructions executable by a processor to cause the processor to: generate keyword vectors representing a context for the keywords based on word embeddings for the keywords (“The neural paraphrase generator comprises: an input adapted to receive a sequence of tuples (t=(t.sub.1, . . . , t.sub.n)) comprising a source sequence of words, each tuple (t.sub.i=(w.sub.i,p.sub.i)) comprising a word data element (w.sub.i) and a structured tag element (p.sub.i), the structured tag element representing a linguistic attribute about the word data element; a recurrent neural network (RNN) comprising an encoder and a decoder, wherein the encoder is adapted to receive a sequence of vectors representing a source sequence of words, and the decoder is adapted to predict a probability of a target sequence of words representing a target output sentence based on a recurrent state in the decoder, a set of previous words and a context vector; an input composition component connected to the input and comprising a word embedding matrix and a tag embedding matrix, the input composition component being adapted to receive and transform the input sequence of tuples into a sequence of vectors by” See e.g., Leidner, para [0026].),
and wherein the template includes a series of language tags indicating an arrangement for words of the generated natural language content (see e.g., “wherein the structured tag element is a part-of-speech tag; further comprising a loss function adapted to learn to predict tuples of words and structured tags and comprising a custom objective function that jointly considers word and structured tag predictions; wherein the custom objective function Jt is formulated as follows:… where D is the parallel training corpus of source tuples (i.e., t), composed of aligned sequences of source words (i.e., w) and source POS tags (i.e., p), and target tuples (i.e., {circumflex over (t)}), composed of aligned sequences of target words (i.e., ŵ) and target POS tags (i.e., {circumflex over (p)}); wherein the RNN includes Long Short Term Memory (LSTM) cells; and further comprising at least one additional linguistic attribute in addition to the structured tag” See e.g., Leidner, para [0027] also see the three equations within the same paragraph);
generate template vectors based on word embeddings for the series of language tags of the template, wherein the template vectors represent a context for the template (see e.g., “…to the tag embedding matrix 556(and 558-560)/656 to generate tag vectors, ...”  “…mapping the structured tag elements (524/526/528 (and 514/516/518); 624/626/628) to the tag embedding matrix 556)…” See e.g., Leidner, para [0057]);
determine contributions from the contexts for the keywords and the template (see e.g., “The input composition component allows the RNN to be fed with both words and POS tags. These two pieces of information flow through the network, and jointly affect both the encoder and the decoder state, as well as the attention context. The main goal of the output decomposition component is to predict both words and POS tags from the output of the neural network.” See e.g., Leidner, paragraph [0064])
based on a comparison of the word embeddings of the series of language tags of the template with word embeddings of the associated language tags of the keywords (see e.g., “Preferably, the neural model correctly predicts both words and POS tags. To enforce this we defined a custom objective function J.sub.t that jointly takes into account words and POS tags predictions. In this example, J.sub.t can be formulated as follows: ...”  See e.g., Leidner, equations 11-13, para[0068]);
and generating, via a machine learning model(“In one exemplary manner, PPDB 2.0 (a large scale automatically mined data set of paraphrases) and COCO (an image captioning data set) were used to confirm use of the invention to extend RNN encoder-decoder models with POS tag information.”  See e.g., Leidner, para [0025]),
one or more words for each language tag of the template from a word vocabulary to produce the natural language content based on combined contributions from the contexts for the keywords and the template (see e.g., “The main goal of the output decomposition component is to predict both words and POS tags from the output of the neural network.” See e.g., Leidner, paragraph [0064]),
wherein the machine learning model includes a recurrent neural network (“In one exemplary manner, PPDB 2.0 (a large scale automatically mined data set of paraphrases) and COCO (an image captioning data set) were used to confirm use of the invention to extend RNN encoder-decoder models with POS tag information.”  See e.g., Leidner, para [0025])
and the word vocabulary is learned from training data during training of the machine learning model (“The neural paraphrase generator may be further characterized in one of more of the following: further comprising an attention module adapted to generate a custom context vector for each prediction based at least in part on an attention function; wherein the attention module is further adapted to generate an attentional vector by concatenating the decoder state and the context vector; wherein the attentional vector is passed through a softmax layer to produce a probability distribution over a word vocabulary data set; wherein the word embedding matrix and the tag embedding matrix are populated with pretrained values; wherein the structured tag element is a part-of-speech tag; further comprising a loss function adapted to learn to predict tuples of words and structured tags and comprising a custom objective function that jointly considers word and structured tag predictions; wherein the custom objective function Jt is formulated as follows: …” See e.g., Leidner, para [0027]).
Leidner does not explicitly, but Agarwal discloses: wherein the keywords are syntactically unordered and associated with language tags ([sect 1] The technical contributions of this paper include the following. Given a set of tickets containing noisy, unstructured, and incomplete text descriptions, we focus on automatically detecting the problem categories of tickets using context-aware natural language processing techniques.  [sect 4.1] Step 1 of phase 1 involves determining the most relevant POS patterns. The intuition behind this POS tag pattern analysis is derived from the work in [8]. It is typically a onetime process that should be performed on a dataset of reasonable size and quality. The patterns thus obtained can be re-used for subsequent analysis in the same domain.  Also see Fig 2 and 4 regarding POS tagging.),
Leidner and Agarwal are considered analogous art because they are both in the related art of text generating.  Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify teaching of Leidner to combine the teaching of Agarwal to incorporate syntactically unordered keywords and associated language tags, because it would allow the machine model to more effectively process unstructured text input (Agarwal, sect 1, introduction).

Regarding claim 20, they recite elements of the method claim of 10, as a computer program product.  Thus, the analysis in rejecting claim 10 is equally applicable to claim 20.

Claims 3, 4, 12, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Leidner, in view of Agarwal, and further in view of  Brownlee (2019, October 21, A Gentle Introduction to Cross-Entropy for Machine Learning. Machinelearningmastery.Com. Retrieved April 25, 2022, from https://machinelearningmastery.com/cross-entropy-for-machine-learning/) hereinafter as Brownlee already of record.

Regarding claim 3, Leidner in view of Agarwal discloses: The method of claim 1, Leidner further discloses: wherein determining contributions comprises: determining a probability for each language tag of the template indicating a likelihood of that language tag of the template matching one of the associated language tags of the keywords (see e.g., “further comprising a loss function adapted to learn to predict tuples of words and structured tags and comprising a custom objective function that jointly considers word and structured tag predictions; wherein the custom objective function Jt is formulated as follows:…” See e.g., Leidner, para [0027] and the three associated equations within), here a “loss function” corresponds to likelihood.
wherein the probability for a corresponding language tag of the template indicates the contribution for the context of the keywords for generating a word for the corresponding language tag of the template (see e.g., please see the three equations in Leidner, para[0027])
Leidner in view of Agarwal does not explicitly, but Brownlee discloses: 
and wherein a complement of the probability indicates the contribution for the context of the template for generating the word for the corresponding language tag of the template (see e.g., “If there are just two class labels, the probability is modeled as the Bernoulli distribution for the positive class label. This means that the probability for class 1 is predicted by the model directly, and the probability for class 0 is given as one minus the predicted probability, for example:” See e.g., Brownlee, pg. 11). Here, “probability for class 1” corresponds to contribution for the context of keyword as mentioned above, and “class 0” corresponds to contribution for the context of the template.  Hence, one minus the predicted probability corresponds to the compliment of the predicted probability.
Leidner, Agarwal and Brownlee are considered analogous art because they are all in the related art of machine learning model.  Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify teaching of Leidner, in view of Agarwal, to combine the teaching of Brownlee to incorporate the complimentary probability, because cross-entropy can be used as a loss function when optimizing classification models like logistics regression and artificial neural networks (Brownlee, page 22, summary)

Regarding claim 4, Leidner in view of Agarwal, further in view of Brownlee discloses: The method of claim 3, 
Leidner further discloses: further comprising: applying the probability for the corresponding language tag of the template to a keyword vector associated with the corresponding language tag of the template to produce the contribution of the context for the keywords (see e.g., “The decoder predicts the probability of a target token and POS tag based on the previous target tuples of words and POS tags, the state of the decoder's RNN and the output of an attention mechanism which calculates the weighted average of the encoder's states. The loss function we use for training the model is the sum of two terms taking into account the predicted words and POS tags, respectively.” See e.g., Leidner, para [0024]
and combining the contributions of the contexts for the keywords and the template to produce the combined contributions (see e.g., “(see e.g., “further comprising a loss function adapted to learn to predict tuples of words and structured tags and comprising a custom objective function that jointly considers word and structured tag predictions; wherein the custom objective function Jt is formulated as follows:…” See e.g., Leidner, para [0027] and the three associated equations within)
Leidner in view of Agarwal does not explicitly, but Brownlee discloses: 
and wherein a complement of the probability indicates the contribution for the context of the template for generating the word for the corresponding language tag of the template (see e.g., “If there are just two class labels, the probability is modeled as the Bernoulli distribution for the positive class label. This means that the probability for class 1 is predicted by the model directly, and the probability for class 0 is given as one minus the predicted probability, for example:” See e.g., Brownlee, pg. 11).  Here, “probability for class 1” corresponds to contribution for the context of keyword as mentioned above, and “class 0” corresponds to contribution for the context of the template.  Hence, one minus the predicted probability corresponds to the compliment of the predicted probability.
Leidner, Agarwal and Brownlee are considered analogous art because they are all in the related art of machine learning model.  Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify teaching of Leidner, in view of Agarwal, to combine the teaching of Brownlee to incorporate the complimentary probability, because cross-entropy can be used as a loss function when optimizing classification models like logistics regression and artificial neural networks (Brownlee, page 22, summary)

Regarding claim 12,  they recite elements of the method claim of 3, as a system.  Thus, the analysis in rejecting claim 3 is equally applicable to claim 12.

Regarding claim 17, they recite elements of the method claim of 3, as a computer program product.  Thus, the analysis in rejecting claim 3 is equally applicable to claim 17.

Claims 5, 13 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Leidner, in view of Agarwal, and further in view of Yan et al. (CN 108073576 A) with reference to the English machine translation provided, hereinafter as Yan, and further in view of  Zhou et al (US Patent Number 10,467,274 B1) hereinafter as Zhou already of record.

Regarding Claim 5, Leidner in view of Agarwal discloses: The method of claim 1, further comprising: 
Leidner in view of Agarwal does not explicitly, but Yan discloses: determining the associated language tags for the keywords via a second machine learning model wherein the second machine learning model is trained with a dataset (see e.g., “The invention is based on the following concept, to solve the problem that the user input search data expression is not clear, an incomplete and cannot find the accurate answer to the problem, the invention uses the natural language understanding technology as the basis, by trained convolutional neural network user data of the user input such as complete sentences to semantic understanding, so as to accurately understand the fuzzy search information input by the user, giving accurate search results.” See e.g., Yan, pg 2, 4th para), 
Leidner, Agarwal, and Yan are considered analogous art because they are all in the related art of data to text generation method, programing and system.  Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify teaching of Leidner, in view of Agarwal to combine the teaching of Yan, to incorporate a second machine learning model for syntax determination. The combination of disclosures would provide accurate search results relating to the search intent, and therefore improve user experience (Yan, background technique).
	Leidner in view of Agarwal, further in view of Yan does not explicitly, but Zhou discloses: the complete sentences without function words (see e.g., “The image captioning system 314 can predict the reward in an uncompleted state (e.g., when the sentence generation is not complete),” See e.g., Zhou, col 3, lines 59-61).
Leidner, Agarwal, Yan and Zhou are considered analogous art because they are all in the related art of data to text generation method, programing and system.  Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify teaching of Leidner, in view of Agarwal, further in view of Yan, to combine the teaching of Zhou, to determine the usage of function words, because neural model can look one step beyond the current state to see how the current prediction will affect later generation processes. By doing this, some potential errors caused by neural learning model can be recovered in the early stage. (Zhou, col 3, lines 61-65)

Regarding claim 13,  they recite elements of the method claim of 5, as a system.  Thus, the analysis in rejecting claim 5 is equally applicable to claim 13.

Regarding claim 18, they recite elements of the method claim of 5, as a computer program product.  Thus, the analysis in rejecting claim 5 is equally applicable to claim 18.

Claims 7, and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Leidner in view Agarwal, further in view of George et al. (US Patent Application Publication No: 2019/0027133 A1) hereinafter as George already of record.

Regarding claim 7, Leidner in view of Agarwal, discloses the method of claim 1, Leidner further discloses: wherein generating the keyword vectors comprises: generating the word embeddings for the keywords (see e.g., “…an input composition component connected to the input and comprising a word embedding matrix and” See e.g., Leidner, para[0026]);
encoding the word embeddings for the keywords using a [second] machine learning model to produce encoded vector representations of the keywords (see e.g., “…a recurrent neural network (RNN) comprising an encoder and a decoder, wherein the encoder is adapted to receive a sequence of vectors representing a source sequence of words,” See e.g., Leidner, para[0026]), [the second machine learning model is disclosed in George disclosure please see below]
and generating the keyword vectors based on the encoded vector representations (see e.g., “…used a bi-directional RNN to model the encoder, and introduce an attention mechanism, which generates one vector representation for each word in the input sequence, …” Leidner, para[0016]).
	Leidner in view of Agarwal does not explicitly, but George discloses: 
second machine learning model (“For example, the models may include models of feedforward neural networks (FNNs), recurrent neural networks (RNNs) such as long short-term memory (LSTM) networks, as well as non-neural networks, such as support vector machines (SVMs).” See e.g., George, [0058])
wherein the second machine learning model is trained to produce the same encoded vector representations for a corresponding set of keywords regardless of an order of keywords in the corresponding set (see e.g., “In some examples, the bag of features 410 may be a vector. As used herein, a bag refers to a multiset of words or features in which order does not matter, but multiple instances of a word or feature may be allowed. In some examples, the vector may be a concatenation of sub vectors. For example, the vector may be a concatenation of two sub vectors. One of the sub vectors may be a bag of words feature vector derived from the common vocabulary 406. A bag of words feature vector may be a vector including the most distinguishing words. In some examples, most distinguishing words can be determined using statistical methods based on weighted word counts in each intent.” See e.g., George, paragraph [0033]);
Leidner, Agarwal and George are considered analogous art because they are all in the related art of language understanding.  Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify teaching of Leidner, in view of Agarwal to combine the teaching of George to produce the second machine learning model and the same encoded vector representation for a set of keyword regardless of an order of the keywords, as doing may result in increased accuracy, decreased memory footprint, and computation. (George, paragraph [0033]).

Regarding claim 8, Leidner in view of Agarwal, further in view of George, discloses the method of claim 7, 
	Leidner further discloses: wherein generating the keyword vectors based on the encoded vector representations further comprises: applying attention weights to the encoded vector representations of the keywords to produce a keyword vector for a corresponding language tag of the template as a weighted combination of the encoded vector representations (see e.g., “…used a bi-directional RNN to model the encoder, and introduce an attention mechanism, which generates one vector representation for each word in the input sequence, …” “The decoder predicts the probability of a target token and POS tag based on the previous target tuples of words and POS tags, the state of the decoder's RNN and the output of an attention mechanism which calculates the weighted average of the encoder's states.”  See e.g., Leidner, paragraph [0016] & [0024]),
wherein the attention weights indicate importance of individual keywords and are based on the corresponding language tag of the template (see e.g., “The decoder predicts the probability of a target token and POS tag based on the previous target tuples of words and POS tags, the state of the decoder's RNN and the output of an attention mechanism which calculates the weighted average of the encoder's states. The loss function we use for training the model is the sum of two terms taking into account the predicted words and POS tags, respectively. The proposed model learns to jointly generate a sequence of words and POS tags. In this way, the prediction of POS tags for the generated paraphrases informs the selection of words, leading to improved performance, …” See e.g., Leidner, paragraph [0024]).

Claim 9 is rejected over Leidner in view of Agarwal, and further in view of Diab (Diab, M. (2009, April). Second generation AMIRA tools for Arabic processing: Fast and robust tokenization, POS tagging, and base phrase chunking. In 2nd International Conference on Arabic Language Resources and Tools (Vol. 110, p. 198).) hereinafter as Diab.
Regarding claim 9, Leidner in view of Agarwal discloses the method of claim 1, 
Leidner further discloses: wherein generating the template vectors comprises: generating the word embeddings for the series of language tags of the template (see e.g., “…comprising a word embedding matrix and a tag embedding matrix,…” See e.g., Leidner, para[0020]);
encoding the word embeddings for the series of language tags of the template using a bidirectional recurrent machine learning model (see e.g., The encoder of the model, which is a bidirectional Recurrent Neural Network (RNN)…” See e.g., Leidner, para[0024]);
and producing the template vectors based on the encoded word embeddings for the series of language tags of the template (see e.g., “…transform the input sequence of tuples into a sequence of vectors by 1) mapping the word data elements to the word embedding matrix to generate word vectors, 2) mapping the structured tag elements to the tag embedding matrix to generate tag vectors, …”  See e.g., Leidner, para[0026]),
	Leidner in view of Agarwal does not explicitly, but Diab discloses: wherein each template vector is produced based on adjacent language tags within the template (“Base phrase chunking is the process by which a sequence of adjacent words are grouped together to form syntactic phrases such as NPs and VPs. An English example of base phrases would be [I]NP [would eat]V P [red luscious apples]NP [on Sundays]PP .
BPC is the first step towards shallow syntactic parsing. Many high end applications such as information extraction and semantic role labeling in English have been proven to benefit tremendously from BPC at a relatively low loss in performance when compared to
the use of deep syntactic parsing.” See e.g., Diab, sect 4).
Leidner, Agarwal, and Diab are considered analogous art because they are all in the related art of text generating.  Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify teaching of Leidner in view of Agarwal, to combine the teaching of Diab to incorporate wherein each template vector is produced based on adjacent language tags within the template, because it would allow for faster shallow syntactic parsing that could be applied directly on some given text, without necessary needing to explicitly invoke tokenization (Diab, sect 1, introduction).

Claims 14 and 19 are rejected over Leidner, in view of Agarwal, further in view of George, and furthermore in view of Diab.
Regarding claim 14, Leidner in view of Agarwal discloses: The system of claim 11, 
Leidner further discloses: wherein generating keyword vectors comprises: generating the word embeddings for the keywords (see e.g., “…an input composition component connected to the input and comprising a word embedding matrix and” See e.g., Leidner, para[0026]);
encoding the word embeddings for the keywords using a [second] machine learning model to produce encoded vector representations of the keywords (see e.g., “…a recurrent neural network (RNN) comprising an encoder and a decoder, wherein the encoder is adapted to receive a sequence of vectors representing a source sequence of words,” See e.g., Leidner, para[0026]), [the second machine learning model is disclosed in George disclosure please see below]
and generating the keyword vectors based on the encoded vector representations (see e.g., “…used a bi-directional RNN to model the encoder, and introduce an attention mechanism, which generates one vector representation for each word in the input sequence, …” Leidner, para[0016]).
Generating the word embeddings for the series of language tags of the template (see e.g., “…comprising a word embedding matrix and a tag embedding matrix, …” See e.g., Leidner, para[0020];
encoding the word embeddings for the series of language tags of the template using a bidirectional recurrent machine learning model (see e.g., “The encoder of the model, which is a bidirectional Recurrent Neural Network (RNN)…” See e.g., Leidner, para[0024]);
and producing the template vectors based on the encoded word embeddings for the language tags of the template (see e.g., “…transform the input sequence of tuples into a sequence of vectors by 1) mapping the word data elements to the word embedding matrix to generate word vectors, 2) mapping the structured tag elements to the tag embedding matrix to generate tag vectors, …”  See e.g., Leidner, para[0026]),
wherein each template vector is produced based on adjacent language tags within the template (see e.g., “…further comprising an attention module adapted to generate a custom context vector for each prediction based at least in part on an attention function; wherein the attention module is further adapted to generate an attentional vector by concatenating the decoder state and the context vector;… wherein the word embedding matrix and the tag embedding matrix are populated with pretrained values; wherein the structured tag element is a part-of-speech tag;” See e.g., Leidner, para[0027]).
Leidner in view of Agarwal does not explicitly, but George discloses: 
second machine learning model (“For example, the models may include models of feedforward neural networks (FNNs), recurrent neural networks (RNNs) such as long short-term memory (LSTM) networks, as well as non-neural networks, such as support vector machines (SVMs).” See e.g., George, [0058])
wherein the second machine learning model is trained to produce the same encoded vector representations for a corresponding set of keywords regardless of an order of  keywords in the corresponding set (see e.g., “In some examples, the bag of features 410 may be a vector. As used herein, a bag refers to a multiset of words or features in which order does not matter, but multiple instances of a word or feature may be allowed. In some examples, the vector may be a concatenation of sub vectors. For example, the vector may be a concatenation of two sub vectors. One of the sub vectors may be a bag of words feature vector derived from the common vocabulary 406. A bag of words feature vector may be a vector including the most distinguishing words. In some examples, most distinguishing words can be determined using statistical methods based on weighted word counts in each intent.” See e.g., George, paragraph [0033]);
Leidner, Agarwal and George are considered analogous art because they are all in the related art of language understanding.  Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify teaching of Leidner, in view of Agarwal to combine the teaching of George to produce the second machine learning model and same encoded vector representation for a set of keyword regardless of an order of the keywords, as doing may result in increased accuracy, decreased memory footprint, and computation. (George, paragraph [0033]).
	Leidner in view Agarwal, further in view of George, discloses: wherein generating the keyword vectors based on the encoded vector representations further comprises: applying attention weights to the encoded vector representations of the keywords to produce a keyword vector for a corresponding language tag of the template as a weighted combination of the encoded vector representations (see e.g., “…used a bi-directional RNN to model the encoder, and introduce an attention mechanism, which generates one vector representation for each word in the input sequence, …” “The decoder predicts the probability of a target token and POS tag based on the previous target tuples of words and POS tags, the state of the decoder's RNN and the output of an attention mechanism which calculates the weighted average of the encoder's states.”  See e.g., Leidner, paragraph [0016] & [0024]),
wherein the attention weights indicate importance of individual keywords and are based on the corresponding language tag of the template (see e.g., “The decoder predicts the probability of a target token and POS tag based on the previous target tuples of words and POS tags, the state of the decoder's RNN and the output of an attention mechanism which calculates the weighted average of the encoder's states. The loss function we use for training the model is the sum of two terms taking into account the predicted words and POS tags, respectively. The proposed model learns to jointly generate a sequence of words and POS tags. In this way, the prediction of POS tags for the generated paraphrases informs the selection of words, leading to improved performance, …” See e.g., Leidner, paragraph [0024]).
Leidner in view of Agarwal , further in view of George, does not explicitly, but Diab discloses: wherein each template vector is produced based on adjacent language tags within the template (“Base phrase chunking is the process by which a sequence of adjacent words are grouped together to form syntactic phrases such as NPs and VPs. An English example of base phrases would be [I]NP [would eat]V P [red luscious apples]NP [on Sundays]PP. BPC is the first step towards shallow syntactic parsing. Many high end applications such as information extraction and semantic role labeling in English have been proven to benefit tremendously from BPC at a relatively low loss in performance when compared to the use of deep syntactic parsing.” See e.g., Diab, sect 4).
Leidner, Agarwal, George and Diab are considered analogous art because they are all in the related art of text generating.  Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify teaching of Leidner in view of Agarwal, further in view of George to combine the teaching of Diab to incorporate wherein each template vector is produced based on adjacent language tags within the template, because it would allow for faster shallow syntactic parsing that could be applied directly on some given text, without necessary needing to explicitly invoke tokenization (Diab, sect 1, introduction).

Regarding claim 19, they recite elements of the system claim of 14, as a computer program product.  Thus, the analysis in rejecting claim 14 is equally applicable to claim 19.



Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Agnihotram et al. (US patent application publication 2019/0370396 A1) hereinafter as Agnihotram.  Agnihotram discloses a system or apparatus of extracting keyword from a document, in a reversal process as the claimed invention stated above. “A method of identifying relevant keywords from a document is disclosed. The method includes splitting text of the document into a plurality of keyword samples, such that each of the plurality of keyword samples comprises a predefined number of keywords extracted in a sequence. Further, each pair of adjacent keyword samples in the plurality of samples includes a plurality of common words. The method further includes determining a relevancy score for each of the plurality of keyword samples based on at least one of a trained Convolution Neural Network (CNN) model and a keyword repository. The method further includes classifying keywords from each of the plurality of keyword samples as relevant keywords or non-relevant keywords based on the relevancy score determined for each of the plurality of keyword samples.” (See e.g., Agnihotram, Abstract)

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Philip H Lam whose telephone number is (571)272-1721. The examiner can normally be reached 10 AM-6 PM Eastern time.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/PHILIP H LAM/Examiner, Art Unit 2656                                                                                                                                                                                                        
/EDGAR X GUERRA-ERAZO/Primary Examiner, Art Unit 2656