DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
A Substitute Specification including the claims is required pursuant to 37 CFR 1.125(a) because of numerous errors including (1) omitted periods at the ends of sentences, (2) missing dashes at word division line breaks, and (3) errors of a grammatical nature due to non-idiomatic translations and improper sentence structures.  Applicants should carefully review the entire written description to provide corrections for the Substitute Specification.
A substitute specification must not contain new matter.  The substitute specification must be submitted with markings showing all the changes relative to the immediate prior version of the specification of record.  The text of any added subject matter must be shown by underlining the added text.  The text of any deleted matter must be shown by strike-through except that double brackets placed before and after the deleted characters may be used to show deletion of five or fewer consecutive characters.  The text of any deleted subject matter must be shown by being placed within double brackets if strike-through cannot be easily perceived.  An accompanying clean version (without markings) and a statement that the substitute specification contains no new matter must also be supplied.  Numbering the paragraphs of the specification of record is not considered a change that must be shown.
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 
The following title is suggested: Natural Language Processing of Encoded Question Tokens and Encoded Table Schema Based on Similarity.
The abstract of the disclosure is objected to because it is greater than one hundred and fifty words.  A new abstract that is less than one hundred and fifty words should be submitted on a separate sheet.  Correction is required.  See MPEP §608.01(b).

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claim 13 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  
The claim does not fall within at least one of the four categories of patent eligible subject matter because it is directed to a computer program product comprising a computer readable storage medium that can be construed as a ‘signal claim’.  Patent case law holds that ‘signal claims’ represent non-patent eligible subject matter under 35 U.S.C. §101.  See In re Nuijten, 500 F.3d 1346, 1357, 84 USPQ2d 1495, 1503 (Fed. Cir. 2007) and MPEP §2106 II and §2106.03.  The USPTO takes the position that claims directed to computer-readable media should be given their broadest reasonable interpretation when read in light of the Specification for purposes of determining if they meet the requirements of 35 U.S.C. §101.  Here, Applicants’ Specification, Page 23, Lines 7 to 20, only provides non-limiting examples of computer readable media and storage devices, but does not exclude transitory embodiments, e.g., wires and optical fibers, that may be consistent with an interpretation as a ‘signal claim’.  Applicants can overcome this rejection by amending the preamble of this claim so that it sets forth “a computer program product comprising a non-transitory computer readable storage medium . . . .” 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 to 4, 6, 9 to 10, and 13 to 14 are rejected under 35 U.S.C. 103 as being unpatentable over Jawagal et al. (U.S. Patent Publication 2021/0240776) in view of Filoti et al. (U.S. Patent Publication 2020/0167427).
  Concerning independent claims 1 and 13 to 14, Jawagal et al. discloses a method, system, and computer program product for responding to user queries, comprising:
“receiving a natural language text input” – question answering (QA) is a computer-related field that spans information retrieval and natural language processing to build systems that can automatically answer questions posed by different people in a natural language (¶[0001]); a query is received from a user (¶[0021]); implicitly, a query received from a user is ‘text’ in natural language (“a natural language text input”);
“performing natural language processing on the natural language text input to generate a plurality of encoded question tokens” – when a query is received from a user, it is processed and parts of speech (POS) data associated with tokens generated from a user query are extracted; the POS data can include a subject of the user’s query (¶[0021]); when a user query 192 is received, it is analyzed by query processor 108 which can tokenize and parse user query 192 (¶[0031]: Figure 1); user query 192 is processed, e.g., tokenized and tagged with POS data (¶[0043]: Figure 4: Steps 412 to 414); broadly, parsing a query to determined parts of speech (POS) is “natural language processing” and tokenizing and parsing a user query generates “a plurality of encoded question tokens”;  
“performing natural language processing on a plurality of table schema stored in a database, to generate a plurality of encoded table schema tokens for each table schema of the plurality of table schema” – an automatic question answering system accesses a plurality of documents pertaining to a domain; the plurality of documents are parsed including a title, a list of sections, etc. (¶[0017]); text extracted from the plurality of documents is tokenized into word tokens wherein each word forms a token and word-specific statistical features are considered including positioning and frequency of word-tokens (¶[0018]); a textual content is extracted from each of the plurality of documents 110 and a keyword search may be employed to identify keywords/phrases that indicate a beginning of captions in a document (¶[0027]); candidate answers 172 and questions 174 are automatically generated from the plurality of documents (¶[0030]); question generator 144 automatically generates questions using a Seq2Seq model 200; question generator 144 includes a deep bi-directional context encoder 202, an answer encoder 204, and a unidirectional decoder 206 (¶[0033]: Figure 2); word embeddings for Seq2Seq question generation can be initialized using pre-trained GLOVE (¶[0037]: Figure 2); a plurality of documents 110 are processed to extract unstructured text and various text organizational structures; the text thus extracted can be used to generate one or more candidate answers 172 that include key phrases so that for each of the identified sections and sub-sections, the relevant key phrases or key entities in the subsections of the documents can be extracted (¶[0042]: Figure 4: Steps 402 to 408); broadly, questions 174 and answers 172 are “a plurality of table schema”; that is, a question is a first column of a table and an answer is a second column of a table, and a plurality of questions and answers comprise a plurality of rows of the table; “natural language processing” is performed to extract word tokens from documents (“to generate a plurality of encoded table schema tokens”), where questions and answers are encoded by word embeddings;
“determining a similarity between the plurality of encoded question tokens and the plurality of encoded table schema tokens for at least two table schemas of the plurality of table schemas” – a user query 192 pertaining to a specific domain corresponding to the plurality of documents is processed and TF-IDF matching is employed to identify a set of contexts specific to the domain, and matching user query 192 to identify an answer span (¶[0043]: Figure 4: Steps 416 to 418); an answer span 166 is identified for user query 192; query processor 108 processes user query 192 to generate word tokens, and tf-idf features of the documents or contexts from the documents are accessed; each time a new document is accessed, tf-idf features based on the unigrams and bigrams are created, and similarities between user query 192 and documents are obtained; cosine similarities can be estimated as a measure of similarity between user query 192 and the plurality of documents (¶[0051]: Figure 8: Step 806);
“determining an output table schema from the plurality of table schema based on the similarity” – using term vector model scoring, a context from a set of contexts associated with the plurality of documents is determined as relevant to a user query (¶[0021); a top N contexts, e.g., the top three contexts, in terms of similarities are selected; there can be different answer spans from different contexts for the user query; a top scoring context 164 is provided which identifies answer span 166 for the highest scoring context (¶[0051]: Figure 8: Steps 808 to 810); here, “an output table schema” is a top scoring context 164, and “the plurality of table schema” are at least the top N contexts;
“outputting a natural language string based on the output table schema” – a machine comprehension (MC) model determines an answer span that includes information requested in the query, and a response to the query is generated in a complete sentence frames in accordance with a grammar and which includes the information from the identified answer span; a generated response is provided to the user via a user interface (¶[0021]); where an answer span 166 forms a complete sentence, answer span 166 can be provided directly to the user as response 194 without further processing by answer composer 184; if it is determined by answer composer 184 that answer span 166 does not include a complete sentence, then answer span 166 within the context is provided to answer composer 184 so that a complete answer can be generated and provided as response 194 to the user posing user query 192 (¶[0025]: Figure 1); natural language generation (NLG) is incorporated along with answer span 166 (¶[0052] - ¶[0053]: Figure 8: Steps 818 to 820).  
Concerning independent claims 1 and 13 to 14, Jawagal et al. arguably discloses all of the limitations of these independent claims.  Here, Jawagal et al. does not expressly disclose “a plurality of table schema”, but pairs of questions and answers extracted from documents appear equivalent to ‘table schema’ because a question and answer pair can be considered as a row of a table, where a question and an answer represent two columns of one row of a table, and there are a plurality of rows for a plurality of question and answer pairs.  Moreover, Jawagal et al. does not expressly disclose “natural language processing” to generate ‘encoded’ tokens for questions from a user and ‘encoded’ tokens for table schemas.  Still, Jawagal et al. discloses a plurality of processes for parsing and tokenizing words and phrases from queries and documents in a manner equivalent to “natural language processing”.  Applicants’ limitation of “encoded” question tokens and “encoded” table schema is not expressly defined by the Specification, and can be broadly construed.  Conceivably, ‘encoded’ tokens might require vectorizing words by word embedding characteristic of machine learning according to a conventional algorithm of GLOVE, but this is not expressly set forth by the claim language.  Anyway, Jawagal et al. appears to perform word embedding of question tokens from a user and word embedding of question and answer pairs to perform a similarity comparison to generate a score.  

Concerning independent claims 1 and 13 to 14, even if these limitations are not disclosed by Jawagal et al., they are taught by Filoti et al.  Generally, Filoti et al. teaches training by artificial intelligence to generate an answer to a query based on answer table patterns, where an information server provides access to unstructured information sources.  (Abstract)  A Q/A system is configured to input N queries, receive N answers to the N queries, and output N (Query, Answer) pairs in order to create a knowledge base as a table.  (¶[0034]: Figure 2)  A table builder is defined as a natural language processing (NLP) engine that builds table annotations, e.g., data entries in cells of a table for each N (Query, Answer) pair.  (¶[0038]: Figure 2)  A table builder uses a natural language processing (NLP) engine to build table annotations for each (Query, Answer) pair based on a provided table annotation schema, and then stores the (Query, Answer) pairs in a knowledge base, so that each set of N (Query, Answer) pairs has a corresponding Table Annotations Schema.  (¶[0044])  If a user enters a query, Q/A system 202 sends the query to information server 252 which uses natural language processing (NLP) to identify the meaning and context of the query.  (¶[0050]: Figure 2)  Filoti et al., then, expressly teaches performing “natural language processing” on unstructured information sources to “generate a plurality of table schema”.  An objective is to train an artificial intelligence system to respond to complex queries.  (Abstract)  It would have been obvious to one having ordinary skill in the art to perform natural language processing to generate a plurality of table schema as taught by Filoti et al. to respond to user questions in Jawagal et al. for a purpose of providing an artificial intelligence system that can respond to complex queries.

Concerning claims 2 to 4, Jawagal et al. discloses using term vector model scoring to determine a context from a set of contexts associated with the plurality of documents as relevant to a user query (¶[0021); each time a new document is accessed, similarities between user query 192 and documents are obtained; cosine similarities can be estimated as a measure of similarity between user query 192 and the plurality of documents (¶[0051]: Figure 8: Step 806); a top N contexts, e.g., the top three contexts, in terms of similarities are selected; there can be different answer spans from different contexts for the user query; a top scoring context 164 is provided which identifies answer span 166 for the highest scoring context (¶[0051]: Figure 8: Steps 808 to 810).  Here, every token of a user query is compared to every token of question/answer pairs to determine a similarity (“determining the similarity is based on a first similarity calculation of each encoded question token with all encoded table schema tokens of at least two table schemas”), and there are at least two question/answer pairs corresponding to the top N scoring contexts (“determining the similarity is based on a second similarity calculation of each encoded table scheme token of at least two table schemas with all encoded question tokens”).  There are, then, “a first similarity calculation” and “a second similarity calculation” corresponding to at least a first table schema and a second table schema of “at least two table schemas”.  
Concerning claim 6, Jawagal et al. discloses using term vector model scoring to determine a context from a set of contexts associated with the plurality of documents as relevant to a user query (¶[0021); a top N contexts, e.g., the top three contexts, in terms of similarities are selected; there can be different answer spans from different contexts for the user query; a top scoring context 164 is provided which identifies answer span 166 for the highest scoring context (¶[0051]: Figure 8: Steps 808 to 810).  Here, a highest scoring context based on a similarity corresponds to “determining the output table scheme is performed by selecting a table schema with the highest similarity from the plurality of table schema.”  That is, a table schema corresponds to a context, and selecting a highest score context of a plurality of top scoring contexts is a table schema with a highest similarity.
Concerning claim 9, Jawagal et al. discloses that a Seq2Srq model 200 is employed in a question generator 144 that includes a context encoder 202 and an answer encoder 206 (“generating the plurality of encoded question tokens is performed by a first encoding layer”); pre-processed sections of each document in context tokens 222 form the input to context encoder 202, and potential extracted answers or answer tokens 242 serve as key input to answer encoder 204 (“generating the plurality of encoded table scheme tokens is performed by a second encoding layer”).  (¶[0033]: Figure 2)
Concerning claim 10, Jawagal et al. discloses that questions and answers are extracted from unstructured text in Steps 402 to 410 prior to receiving a user query and identifying a set of contexts to identify an answer matching the user query in Steps 412 to 420 (“generating the plurality of encoded table schema tokens of each table schema by the second encoding layer is performed before receiving the natural language text input”) (¶[0041] – ¶[0043]: Figure 4).

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Jawagal et al. (U.S. Patent Publication 2021/0240776) in view of Filoti et al. (U.S. Patent Publication 2020/0167427) as applied to claim 1 above, and further in view of Liu (U.S. Patent Publication 2018/0373782).
Jawagal et al. discloses determining a similarity between a user query and a context representing questions and answers, but does not expressly disclose “rejecting the output table schema if a similarity is below a threshold.”  Still, it is fairly well known in natural language processing to provide similarity thresholds to determine if a match is appropriate according to a degree of similarity.  
Generally, Liu teaches recommending answers to a question based on artificial intelligence by matching a query to questions having answers in a question and answer repository, calculating a similarity between the query and each of the questions having answers in the question and answer repository, and determining if a question with a semantic similarity to a query greater that a preset threshold exists in the question and answer repository.  (Abstract)  When a query, ‘weather condition in Beijing today’ is input by a user, semantic similarity between the query and each of the candidate similar questions in the question and answer repository is {weather condition in Beijing today, What’s the weather today in Beijing}: 0.9, {weather condition in Beijing today, weather in Beijing}: 0.6, or {weather condition in Beijing today, today’s weather}: 0.6.  Only semantic similarity between the query ‘weather condition in Beijing today’ and the question ‘What’s the weather in Beijing today?’ is greater than 0.8.  (¶[0096] - ¶[0100]: Figure 3)  Liu, then, teaches “rejecting” a schema “if a similarity is below a threshold” for questions {weather condition in Beijing today, weather in Beijing}: 0.6, or {weather condition in Beijing today, today’s weather}: 0.6, where a threshold is 0.8, and a similarity of 0.6 is below a threshold of 0.8, so that only ‘What’s the weather in Beijing today?’ with a similarity greater than 0.8 is recommended to a user.  That is, questions that do not have a similarity greater than a threshold are ‘rejected’ and only a question with a similarity greater than a threshold is ‘accepted’.  An objective is to recommend an answer according to a word matching degree that does not have a misunderstanding problem due to colloquial expressions that would result in a bad user experience.  (¶[0004])  It would have been obvious to one having ordinary skill in the art to reject questions in a schema of a question and answer repository having a similarity threshold below a threshold as taught by Liu to respond to user queries in Jawagal et al. for a purpose of providing a better user experience that does not have a misunderstanding problem due to colloquial expressions.

Claims 7 to 8 are rejected under 35 U.S.C. 103 as being unpatentable over Jawagal et al. (U.S. Patent Publication 2021/0240776) in view of Filoti et al. (U.S. Patent Publication 2020/0167427) as applied to claim 1 above, and further in view of Allen et al. (U.S. Patent Publication 2015/0339574).
Concerning claim 7, Jawagal et al. can be construed to disclose the steps of an iterative process of “rejecting the output table schema”, “selecting a further outputting table schema”, and “repeating” because a similarity is determined between a user query and every context, and only the top N contexts in terms of similarities are selected.  (¶[0051]: Figure 8: Steps 806 to 810)  That is, only contexts with the top similarities are selected, and the remaining contexts are iteratively ‘rejected’.  The only element not expressly disclosed by Jawagal et al. is “executing a validation query for the output table schema to determine whether the output table schema has valid information for outputting the natural language string”, “executing the validation query for further output table schema”, and repeating if it is determined “the further output table schema has no valid information.”  Applicants’ Specification, Page 3, Lines 14 to 24, and Page 12, Lines 1 to 22, literally provides support for these limitations, but does not explain in significant detail precisely what the objective is of the validation query, as this only appears to be provided as a way of performing some additional check on output.  
Concerning claim 7, Allen et al. teaches an extensible validation framework for questions and answer systems, where a validator is selected to apply to a candidate answer based on a characteristic of a correct answer for the input question.  The validator is applied to a candidate answer to evaluate whether or not criteria of the validator are met by the candidate answer.  (Abstract)  An objective is to provide a extensible validation framework for improved accuracy and performance in a question and answer system.  (¶[0001])  It would have been obvious to one having ordinary skill in the art to execute a validation query to evaluate if a candidate answer meets certain criteria as taught by Allen et al. to respond to user queries in Jawagal et al. for a purpose of obtaining improved accuracy and performance in a question and answer system.   
Concerning claim 8, Jawagal et al. discloses that a top N contexts, e.g., the top three contexts, in terms of similarities are selected; there can be different answer spans from different contexts for the user query; a top scoring context 164 is provided which identifies answer span 166 for the highest scoring context (¶[0051]: Figure 8: Steps 808 to 810).  Here, determining a top N contexts according to similarities is “statistically ranking the plurality of” contexts, and identifying the highest scoring context is “selecting” a context “based on the statistically ranking.”  That is, scoring by similarity to identify top contexts is “statistical ranking”.

Claims 11 to 12 are rejected under 35 U.S.C. 103 as being unpatentable over Jawagal et al. (U.S. Patent Publication 2021/0240776) in view of Filoti et al. (U.S. Patent Publication 2020/0167427) as applied to claim 1 above, and further in view of Mishra et al. (U.S. Patent No. 10,776,579).
Filoti et al. teaches “table schemas” but omits “enriching at least one encoded table schema token by adding an embedded table schema synonym token and/or an embedded table scheme content token to an embedded table schema token for generating the at least one enriched encoded table schema token” and “enriching the natural language text input by text normalization and/or semantic enrichment before performing natural language processing.”  Still, it is known in the prior art to automatically expand searches by using synonyms.  Mishra et al. teaches generating variable natural language descriptions from structured data, where an entity relation tuple is generated based on sub-tables extracted from a data table.  (Column 7, Lines 43 to 62: Figure 4)  Specifically, a lexical database may store nouns, verbs, adjectives, and adverbs into sets of cognitive synonyms, where cognitive synonyms may be interlinked by means of conceptual-semantic and lexical relations.  A resulting network of meaningfully related words and concepts may then be queried with a term to identify a list of alternative terms that are available for use in the same context.  (Column 8, Lines 49 to 58)  A tuple <PERSON, game, SPORT> may be analyzed by tuple generator 124 and may be modified to <PERSON, play, SPORT> based on output of word to vector embeddings and lexical database.  (Column 9, Lines 5 to 10)  The extracted tuples may be enriched by generating variations of the tuples using, e.g., lemmatization and synonym replacement of the relation terms including verbs.  The enriched tuples and corresponding sentences may be tagged by replacing entities with a corresponding named-entity tag.  (Column 10, Lines 36 to 45)  Here, lemmatization is equivalent to “text normalization” because it represents inflected forms of words by a single base form, e.g., ‘to walk’ represents ‘walk’, ‘walked’, ‘walking’, etc.  Mishra et al., then, teaches ‘enriching by adding a synonym token’ in a table schema of tuples to generate ‘at least one enriched table schema token’, and ‘enriching the natural language text input by text normalization and/or semantic enrichment’.  An objective is to generate natural language descriptions that do not require rule based systems that require large amounts of manual effort and are not scalable.  (Column 1, Lines 13 to 21)  It would have been obvious to one having ordinary skill in the art to provide semantic enrichment with synonym tokens and text normalization as taught by Mishra et al. of table schema and text input of Filoti et al. for a purpose of generating natural language descriptions that do not require rule based systems.

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicants’ disclosure.
Troyanova et al., Chen et al., Kim et al. (‘994), and Kim et al. (‘800) disclose related prior art.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 272-7608.  The examiner can normally be reached Monday-Thursday 8:30 AM-6:00 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571) 272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center.  Unpublished application information in Patent Center is available to registered users.  To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov.  Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format.  For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MARTIN LERNER/Primary Examiner
Art Unit 2657                                                                                                                                                                                                        May 18, 2022