Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is responsive to communications: Application filed on 02/25/202. Claims 1, 14 and 20 are independent claims. Claims 1-20 have been examined and rejected in the current patent application. 
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 02/25/2020. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 4, 5, 12-15 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Rankin et al. (US 2014/0223284 A1, hereinafter Rankin) in view of Pestian (US 2015/0081280 A1, hereinafter Pestian).   
Regarding independent claim(s) 1, Rankin discloses a computer implemented method comprising: accessing, by a processor, a database stored in a memory, the database comprising a (Rankin discloses these instructions may be operational and/or data instructions containing and/or referencing other instructions and data in various processor accessible and operable areas of memory 729 (e.g., registers, cache memory, random access memory, etc.). The memory 729 may contain a collection of program and/or database components and/or data and a stored program component that is executed by a CPU. The database component includes several tables: Industry table includes fields and a User table includes fields. For example, user_id, user_name, user_ employer, user_contact_address, industry_id, listing_id, industry id, industry name, data_field_id and data_field_ value. The person (or trainer) may edit 170 and 171 each attribute and extraction field. (plurality of data items, each data item comprising one or more properties, each property of the one or more properties having an associated value). The MLDA may identify the abstraction field values within previously identified paragraphs, or categorized paragraphs for enumerated field types (i.e., rent type, TI allowance, etc). The MLDA may identify in free-text a set of pre-defined entities of interest (i.e. listing attributes-address, broker information, etc.) and annotation of text with a list of pre-defined categories. The MLDA may use the training data with machine learning and generate a machine learning model, (see Ranki: Para. 0025-0040, 0058, 0111-0112, 0115-0120, 0133-0141 and 0145-0150). This reads on the claim concepts of a computer implemented method comprising: accessing, by a processor, a database stored in a memory, the database comprising a plurality of data items, each data item comprising one or more properties, each property of the one or more properties having an associated value, the database being structured with a pre-defined data model or format); 
accessing, by the processor, a first dataset stored in the memory and comprising text, wherein a portion of the text contains data derived from the database (Ranki discloses a portion of the automatically annotated documents may set aside for human validation (based on ML confidence score, i.e. a threshold probability that the extracted information is correct). The MLDA database component 719 may be embodied in a database and its stored data. The database is a stored program component, which is executed by the CPU; the stored program component portion configuring the CPU to process the stored data. The database may be a conventional, fault tolerant, relational, scalable, secure database such as Oracle or Sybase. The MLDA database may be implemented using various standard data-structures, such as an array, hash, (linked) list, struct, structured text file (e.g., XML), table, and access to the MLDA database, (see Ranki: Para. 0048 and 0135-0148). This reads on the claim concepts of accessing, by the processor, a first dataset stored in the memory and comprising text, wherein a portion of the text contains data derived from the database);    
segmenting, by the processor, the text of the first dataset into tokens, the tokens comprising one or more characters (Ranki discloses the ML algorithm may classify individual words (tokens) into one of several categories (e.g. lease size, broker email, listing street address, etc.). The kind of the preceding and following tokens (e.g. number, word (text), punctuation). The bag-of-words approach may consider only individual tokens (unigrams). Providing more contextual information (sequence of words). The MLDA may classify individual tokens (from a previously identified paragraph) into referring to the value of an abstraction field. A paragraph referring to Rent Type is then classified into one of the Rent Type categories (one or more characters). The MLDA database component 719 may be embodied in a database and its stored data. The database is a stored program component, which is executed by the CPU; the stored program component portion configuring the CPU to process the stored data, (see Ranki: Para. 0039-0045 and 0107-0111). This reads on the claim concepts of segmenting, by the processor, the text of the first dataset into tokens, the tokens comprising one or more characters); 
storing, by the processor, the annotations and associated database properties and database values in the memory as an annotated dataset (Ranki discloses The MLDA server may store 115 the initial data set and the initial rules to the MLDA database 109. MLDA controller may serve to aggregate, process, store, search, serve, identify, instruct, generate, match, and/or facilitate interactions with a computer through behavior assessment technologies, and/or other related data. The annotations may be used as training data for machine learning. The MLDA may train a machine learning algorithm to identify paragraphs relevant to an abstraction field. The MLDA may identify the abstraction field values within previously identified paragraphs, or categorized paragraphs for enumerated field types (i.e., rent type, TI allowance, etc). Process the data so that it is represented with structures and annotations. For example, a person may use the interface to manually validate the data annotated by the machine learning algorithm. The person (or trainer) may edit 170 and 171 each attribute and extraction field. The newly validated data may be fed into the machine learning program to update and improve the model. These instructions may be operational and/or data instructions containing and/or referencing other instructions and data in various processor accessible and operable areas of memory 729 (e.g., registers, cache memory, random access memory, etc.), (see Ranki: Para. 0035-0060, 0114 and 0162). This reads on the claim concepts of storing, by the processor, the annotations and associated database properties and database values in the memory as an annotated dataset).    
However, Ranki does not appear to specifically disclose identifying, by the processor, tokens in the first dataset that match property values in the database for predetermined database properties; determining, by the processor, whether the identified tokens in the first dataset represent values associated with a property in the database; annotating, by the processor, the identified tokens of the first dataset when the identified tokens are determined to represent values associated with a property in the database, wherein annotating comprises associating a tag with each identified token and assigning annotation attributes for each tag. 
 In the same field of endeavor, Pestian discloses identifying, by the processor, tokens in the first dataset that match property values in the database for predetermined database properties (Pestian discloses processing natural language may include the steps of providing a text, the text including a plurality of groups of characters; providing a database, the database including associations between a plurality of known words and a plurality of concepts, the database further including quantitative values. The identified known words; querying the database to obtain a first set of concepts associated with each of the identified known words; and annotating the list of identified known words with the first set of concepts associated with each identified known word.  For example, in the original text, the last token 210 (before the".") listed was changed from "die" to "discharge". Each token 210 has also been tagged. This supervised training set was then used to train the spreading activation software for identifying concepts in similar radiology transcriptions. Add words to a phrase that match mentioned POS tags until there is a phrase that is not in the UMLS and in additional concepts that matches the search concept. A weighting value (predetermined threshold) reflecting the strength of each relationship and these columns contain integer values indicating the document with which the new concept is associated, (see Pestian: Para. 0022-0050, 0060-0072, 0108-0124, 0131-0150 and 0190). This reads on the claim concepts of identifying, by the processor, tokens in the first dataset that match property values in the database for predetermined database properties); 
determining, by the processor, whether the identified tokens in the first dataset represent values associated with a property in the database (Pestian discloses that is, at iteration zero, concepts in the UMLS are identified in the clinical free-text if their weights meet the selected threshold. A first set of concepts associated with each of the identified known words and annotating the list of identified known words with the first set of concepts associated with each identified known word.  The POS tagger will be trained using the sample of tokens. The trained POS tagger will then be used to annotate the entire CUTC. A weighting value (predetermined threshold) reflecting the strength of each relationship and these columns contain integer values indicating the document with which the new concept is associated (quantitative value representative of the strength of the relationship between each concept). Determining a minimum threshold weighting value; and eliminating from the semantic network the relationships and the additional relationships that do not satisfy the minimum threshold weighting value. The method may include the steps of comparing the at least one concepts and the additional concepts to a list of known relevant concepts to generate a list of identified relevant concepts; and providing an output based on at least one of a number and a significance of the identified relevant concepts. The text including a plurality of groups of characters; providing a database, the database including associations between a plurality of known words and a plurality of concepts, (see Pestian: para. 0051, 0129-0138 and 0140-0150). This reads on the claim concepts of determining, by the processor, whether the identified tokens in the first dataset represent values associated with a property in the database);                          
annotating, by the processor, the identified tokens of the first dataset when the identified tokens are determined to represent values associated with a property in the database, wherein annotating comprises associating a tag with each identified token and assigning annotation attributes for each tag (Pestian discloses annotating the list of identified words with the second set of concepts by considering the quantitative value representative of the strength of the relationship between each concept in the second set of concepts and its associated concept in the first set of concepts. Both the Manchester tagger and the TreeTagger can be tuned to new text-types by training against corpora. Even very small-annotated corpora of the right text type can make a big difference to performance. A first set of concepts associated with each of the identified known words and annotating the list of identified known words with the first set of concepts associated with each identified known word.  The POS tagger will be trained using the sample of tokens. The trained POS tagger will then be used to annotate the entire CUTC. A weighting value (predetermined threshold) reflecting the strength of each relationship and these columns contain integer values indicating the document with which the new concept is associated (quantitative value representative of the strength of the relationship between each concept). Determining a minimum threshold weighting value; and eliminating from the semantic network the relationships and the additional relationships that do not satisfy the minimum threshold weighting value. The method may include the steps of comparing the at least one concepts and the additional concepts to a list of known relevant concepts to generate a list of identified relevant concepts; and providing an output based on at least one of a number and a significance of the identified relevant concepts. The text including a plurality of groups of characters; providing a database, the database including associations between a plurality of known words and a plurality of concepts. The part-of-speech tags and including UMLS concepts. The CUTC contains individual tokens 210 and a hand annotated POS 212. A tag-specific measure of agreement may be obtained to examine the extent to which the two processes tend to lead to consistent conclusions with respect to the particular tag, (see Pestian: para. 0051, 0097-0112, 0124-0138 and 0140-0150). This reads on the claim concepts of annotating, by the processor, the identified tokens of the first dataset when the identified tokens are determined to represent values associated with a property in the database, wherein annotating comprises associating a tag with each identified token and assigning annotation attributes for each tag); and 
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the machine learning data annotation of Ranki in order to have incorporated identified tokens, as disclosed by Pestian, these are directed to natural language processing of free text using domain-specific spreading activation. Embodiments of the present invention ontologize free text using an algorithm based on neurocognitive theory by simulating human recognition, semantic, and episodic memory approaches. Embodiments of the invention may be used to process clinical text for assignment of billing codes, analyze suicide notes or legal discovery materials, and for processing other collections of text, for example. Further, embodiments of the invention may be used to more effectively search large databases, such as a database containing a large number of medical publications. The text may include clinical free text, for example; and the clinical free text may include pediatric clinical free text. The text may include a plurality of documents and the method may further include the step of identifying a subset of the plurality of documents by identifying at least two documents having associations with at least one identical concept. In machine learning, data annotation is the process of labeling data to show the outcome you want your machine learning model to predict. Such as, labeling, tagging, transcribing, or processing - a dataset with the features you want your machine learning system to learn to recognize. Annotation is one of the most important processes in the generation of chatbot training datasets and other NLP training data. It is the act of locating, extracting, and tagging entities in text. Annotated data reveals feature that will train your algorithms to identify the same features in data that has not been annotated. Data annotation is used in supervised learning and hybrid, or semi-supervised, machine learning models that involve supervised learning. As tokens are the building blocks of Natural Language, the most common way of processing the raw text happens at the token level. Tokenization is the foremost step while modeling text data. Tokenization is performed on the corpus to obtain tokens. Corpus annotation is the practice of adding interpretative linguistic information to a corpus. For example, one common type of annotation is the addition of tags, or labels, indicating the word class to which words in a text belong. This is so-called part-of-speech tagging (or POS tagging), and can be useful, for example, in distinguishing words which have the same spelling, but different meanings or pronunciation. If a word in a text is spelt present, it may be a noun (= 'gift'), a verb (= 'give someone a present') or an adjective (= 'not absent'). The meanings of these same-looking words are very different, and also there is a difference of pronunciation, since the verb present has stress on the final syllable. Incorporating the teachings of Pestian into Ranki would produce a process clinical text for assignment of billing codes, analyze suicide notes or legal discovery materials, and for processing other collections of text. Further, embodiments of the invention may be used to more effectively search large databases, such as a database containing a large number of medical publications, as disclosed by Pestian, (see Abstract).
Regarding dependent claim(s) 2, the combination of Ranki and Pestian discloses the computer implemented method as in claim 1. However, Ranki does not appear to specifically disclose wherein the first dataset comprises a plurality of electronic documents relating to a plurality of patients. 
  In the same field of endeavor, Pestian discloses wherein the first dataset comprises a plurality of electronic documents relating to a plurality of patients (Pestian discloses the method may include the step of providing an output including at least one of the concepts in the first set of concepts.  The text may include a plurality of documents and the method may further include the step of identifying a subset of the plurality of documents by identifying at least two documents having associations with at least one identical concept. The clinical documents from the following subspecialties: radiology, nephrology, pulmonary, behavioral medicine, psychiatry, rheumatology, pathology, cardiology, allergy and immunology, critical care, hematology/oncology, and human genetics. Suicide notes and recorded discussions with suicidal patients are artifacts of the patient's inimical thoughts (analyze suicide notes). For example, the two sentences provide two different POS tags for the token patient, (see Pestian: Para. 0025-0045, 0055-0071, 0100-0135, 0185-0190 and FIG. 8-9). This reads on the claim concepts of wherein the first dataset comprises a plurality of electronic documents relating to a plurality of patients). 
Regarding dependent claim(s) 4, the combination of Ranki and Pestian discloses the computer implemented method as in claim 1. However, Ranki does not appear to specifically disclose wherein the text of the first dataset is unstructured without a pre-defined data model or format. 
  In the same field of endeavor, Pestian discloses wherein the text of the first dataset is unstructured without a pre-defined data model or format (Pestian discloses its content, or the content of any medical records, can be classified into two general types of data: structured and unstructured. These data have relevant cells of output from a particular test, and can be relied on to contain information in an expected way. By contrast, unstructured data, including such clinical free-text as transcribed discharge summaries, contain data whose interpretation may be substantially more challenging (Unstructured is typically textual), (see Pestian: Para. 0025-0030). This reads on the claim concepts of wherein the text of the first dataset is unstructured without a pre-defined data model or format). 
Regarding dependent claim(s) 5, the combination of Ranki and Pestian discloses the computer implemented method as in claim 1. However, Ranki does not appear to specifically disclose wherein the data derived from the database contains protected health information (Pestian discloses a Health Information Management professional may be queried by the system to curate the data and manually assign a billing code. The system may learn from the HIM professional and it may remember the correct coding result thereby expanding its knowledge base. Finally, the entire cycle is repeated for each new patient visit. The system learning from the HIM professional and remembers the correct coding result, thereby expanding its knowledge base. The entire cycle may be repeated for each new patient visit. A database containing a large number of medical publications. Personalized Medicine is the delivery of health care that is based upon an individual's specific genotype, current clinical state and environmental conditions. (see Pestian: Para. 0022-0040 and 0182-0190). This reads on the claim concepts of wherein the data derived from the database contains protected health information). 
Regarding dependent claim(s) 12, the combination of Ranki and Pestian discloses the computer implemented method as in claim 1. Ranki further discloses comprising training a machine learning model using the annotated dataset, wherein the result is a machine learned model (Ranki discloses a set of training data may be provided to the MLDA (machine learning data annotation). Training data contains annotated and/or extracted data entered manually by trainers. The MLDA may use the training data with machine learning and generate a machine learning model. The machine learning model may be further used to annotate and extract new data, (see Ranki: Para. 0027-0040). This reads on the claim concepts of comprising training a machine learning model using the annotated dataset, wherein the result is a machine learned model).
Regarding dependent claim(s) 13, the combination of Ranki and Pestian discloses the computer implemented method as in claim 12. Ranki further discloses comprising identifying text in another dataset using the machine learned model (Ranki discloses to output the results from Machine Learning, the text in the document can be similarly highlighted and the actual extracted listing information appears in an editable web form next to the initial data set file 290. user interface may be linguistic annotation of text with a list of pre-defined categories. Users may be assigned a set of documents and their task is to select text that refers to a set of categories: e.g. listing address, broker name/phone/email, company, sf, etc. These selections are then used as input to a Machine Learning algorithm. The purpose of this type of annotation may be to identify text, (see Ranki: Para. 0027-0040, 0054-0070 and 0111). This reads on the claim concepts of comprising identifying text in another dataset using the machine learned model). 
Regarding independent claim(s) 14, Rankin discloses an automatic annotating system comprising: a data preparer configured to access, from an authorized system, a database and a first dataset stored in a memory, the database comprising a plurality of data items, each data item comprising one or more properties, each property of the one or more properties having an associated value (Rankin discloses these instructions may be operational and/or data instructions containing and/or referencing other instructions and data in various processor accessible and operable areas of memory 729 (e.g., registers, cache memory, random access memory, etc.). The memory 729 may contain a collection of program and/or database components and/or data and a stored program component that is executed by a CPU. The database component includes several tables: Industry table includes fields and a User table includes fields. For example, user_id, user_name, user_ employer, user_contact_address, industry_id, listing_id, industry id, industry name, data_field_id and data_field_ value. The person (or trainer) may edit 170 and 171 each attribute and extraction field. (plurality of data items, each data item comprising one or more properties, each property of the one or more properties having an associated value). The MLDA may identify the abstraction field values within previously identified paragraphs, or categorized paragraphs for enumerated field types (i.e., rent type, TI allowance, etc). The MLDA may identify in free-text a set of pre-defined entities of interest (i.e. listing attributes-address, broker information, etc.) and annotation of text with a list of pre-defined categories. The MLDA may use the training data with machine learning and generate a machine learning model, (see Ranki: Para. 0025-0040, 0058, 0111-0112, 0115-0120, 0133-0141 and 0145-0150). This reads on the claim concepts of a data preparer configured to access, from an authorized system, a database and a first dataset stored in a memory, the database comprising a plurality of data items, each data item comprising one or more properties, each property of the one or more properties having an associated value),
the first dataset comprising text, wherein a portion of the text contains data derived from the database (Ranki discloses a portion of the automatically annotated documents may set aside for human validation (based on ML confidence score, i.e. a threshold probability that the extracted information is correct). The MLDA database component 719 may be embodied in a database and its stored data. The database is a stored program component, which is executed by the CPU; the stored program component portion configuring the CPU to process the stored data. The database may be a conventional, fault tolerant, relational, scalable, secure database such as Oracle or Sybase. The MLDA database may be implemented using various standard data-structures, such as an array, hash, (linked) list, struct, structured text file (e.g., XML), table, and access to the MLDA database, (see Ranki: Para. 0048 and 0135-0148). This reads on the claim concepts of the first dataset comprising text, wherein a portion of the text contains data derived from the database); 
a tokenizer coupled with the data preparer and configured to segment the text of the first dataset into tokens, the tokens comprising one or more characters (Ranki discloses the ML algorithm may classify individual words (tokens) into one of several categories (e.g. lease size, broker email, listing street address, etc.). The kind of the preceding and following tokens (e.g. number, word (text), punctuation). The bag-of-words approach may consider only individual tokens (unigrams). Providing more contextual information (sequence of words). The MLDA may classify individual tokens (from a previously identified paragraph) into referring to the value of an abstraction field. A paragraph referring to Rent Type is then classified into one of the Rent Type categories (one or more characters). The MLDA database component 719 may be embodied in a database and its stored data. The database is a stored program component, which is executed by the CPU; the stored program component portion configuring the CPU to process the stored data, (see Ranki: Para. 0039-0045 and 0107-0111). This reads on the claim concepts of a tokenizer coupled with the data preparer and configured to segment the text of the first dataset into tokens, the tokens comprising one or more characters); 
the assigned annotation attributes for the respective tags are stored in the memory as an annotated dataset (Ranki discloses The MLDA server may store 115 the initial data set and the initial rules to the MLDA database 109. MLDA controller may serve to aggregate, process, store, search, serve, identify, instruct, generate, match, and/or facilitate interactions with a computer through behavior assessment technologies, and/or other related data. The annotations may be used as training data for machine learning. The MLDA may train a machine learning algorithm to identify paragraphs relevant to an abstraction field. The MLDA may identify the abstraction field values within previously identified paragraphs, or categorized paragraphs for enumerated field types (i.e., rent type, TI allowance, etc). Process the data so that it is represented with structures and annotations. For example, a person may use the interface to manually validate the data annotated by the machine learning algorithm. The person (or trainer) may edit 170 and 171 each attribute and extraction field. The newly validated data may be fed into the machine learning program to update and improve the model. These instructions may be operational and/or data instructions containing and/or referencing other instructions and data in various processor accessible and operable areas of memory 729 (e.g., registers, cache memory, random access memory, etc.), (see Ranki: Para. 0035-0060, 0114 and 0162). This reads on the claim concepts of the assigned annotation attributes for the respective tags are stored in the memory as an annotated dataset), and  
However, Ranki does not appear to specifically disclose a data analyzer coupled with the tokenizer and configured to identify tokens in the first dataset that match property values in the database for predetermined database properties and determine whether the identified tokens in the first dataset represent values associated with a property in the database; and an annotator coupled with the data analyzer and configured to annotate the identified tokens of the first dataset when the identified tokens are determined to represent values associated with a property in the database, wherein the annotator, to annotate the identified tokens, is further configured to associate a tag with each identified token and assign annotation attributes for each tag, wherein the respective tags, the identified tokens associated with the respective tags. 
In the same field of endeavor, Pestian discloses a data analyzer coupled with the tokenizer and configured to identify tokens in the first dataset that match property values in the database for predetermined database properties (Pestian discloses processing natural language may include the steps of providing a text, the text including a plurality of groups of characters; providing a database, the database including associations between a plurality of known words and a plurality of concepts, the database further including quantitative values. The identified known words; querying the database to obtain a first set of concepts associated with each of the identified known words; and annotating the list of identified known words with the first set of concepts associated with each identified known word.  For example, in the original text, the last token 210 (before the".") listed was changed from "die" to "discharge". Each token 210 has also been tagged. This supervised training set was then used to train the spreading activation software for identifying concepts in similar radiology transcriptions. Add words to a phrase that match mentioned POS tags until there is a phrase that is not in the UMLS and in additional concepts that matches the search concept. A weighting value (predetermined threshold) reflecting the strength of each relationship and these columns contain integer values indicating the document with which the new concept is associated, (see Pestian: Para. 0022-0050, 0060-0072, 0108-0124, 0131-0150 and 0190). This reads on the claim concepts of a data analyzer coupled with the tokenizer and configured to identify tokens in the first dataset that match property values in the database for predetermined database properties) and 
determine whether the identified tokens in the first dataset represent values associated with a property in the database (Pestian discloses that is, at iteration zero, concepts in the UMLS are identified in the clinical free-text if their weights meet the selected threshold. A first set of concepts associated with each of the identified known words and annotating the list of identified known words with the first set of concepts associated with each identified known word.  The POS tagger will be trained using the sample of tokens. The trained POS tagger will then be used to annotate the entire CUTC. A weighting value (predetermined threshold) reflecting the strength of each relationship and these columns contain integer values indicating the document with which the new concept is associated (quantitative value representative of the strength of the relationship between each concept). Determining a minimum threshold weighting value; and eliminating from the semantic network the relationships and the additional relationships that do not satisfy the minimum threshold weighting value. The method may include the steps of comparing the at least one concepts and the additional concepts to a list of known relevant concepts to generate a list of identified relevant concepts; and providing an output based on at least one of a number and a significance of the identified relevant concepts. The text including a plurality of groups of characters; providing a database, the database including associations between a plurality of known words and a plurality of concepts, (see Pestian: para. 0051, 0129-0138 and 0140-0150). This reads on the claim concepts of determine whether the identified tokens in the first dataset represent values associated with a property in the database); and 
an annotator coupled with the data analyzer and configured to annotate the identified tokens of the first dataset when the identified tokens are determined to represent values associated with a property in the database, wherein the annotator, to annotate the identified tokens, is further configured to associate a tag with each identified token and assign annotation attributes for each tag, wherein the respective tags, the identified tokens associated with the respective tags (Pestian discloses annotating the list of identified words with the second set of concepts by considering the quantitative value representative of the strength of the relationship between each concept in the second set of concepts and its associated concept in the first set of concepts. Both the Manchester tagger and the TreeTagger can be tuned to new text-types by training against corpora. Even very small-annotated corpora of the right text type can make a big difference to performance. A first set of concepts associated with each of the identified known words and annotating the list of identified known words with the first set of concepts associated with each identified known word.  The POS tagger will be trained using the sample of tokens. The trained POS tagger will then be used to annotate the entire CUTC. A weighting value (predetermined threshold) reflecting the strength of each relationship and these columns contain integer values indicating the document with which the new concept is associated (quantitative value representative of the strength of the relationship between each concept). Determining a minimum threshold weighting value; and eliminating from the semantic network the relationships and the additional relationships that do not satisfy the minimum threshold weighting value. The method may include the steps of comparing the at least one concepts and the additional concepts to a list of known relevant concepts to generate a list of identified relevant concepts; and providing an output based on at least one of a number and a significance of the identified relevant concepts. The text including a plurality of groups of characters; providing a database, the database including associations between a plurality of known words and a plurality of concepts. The part-of-speech tags and including UMLS concepts. The CUTC contains individual tokens 210 and a hand annotated POS 212. A tag-specific measure of agreement may be obtained to examine the extent to which the two processes tend to lead to consistent conclusions with respect to the particular tag, (see Pestian: para. 0051, 0097-0112, 0124-0138 and 0140-0150). This reads on the claim concepts of an annotator coupled with the data analyzer and configured to annotate the identified tokens of the first dataset when the identified tokens are determined to represent values associated with a property in the database, wherein the annotator, to annotate the identified tokens, is further configured to associate a tag with each identified token and assign annotation attributes for each tag, wherein the respective tags, the identified tokens associated with the respective tags), and
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the machine learning data annotation of Ranki in order to have incorporated identified tokens, as disclosed by Pestian, these are directed to natural language processing of free text using domain-specific spreading activation. Embodiments of the present invention ontologize free text using an algorithm based on neurocognitive theory by simulating human recognition, semantic, and episodic memory approaches. Embodiments of the invention may be used to process clinical text for assignment of billing codes, analyze suicide notes or legal discovery materials, and for processing other collections of text, for example. Further, embodiments of the invention may be used to more effectively search large databases, such as a database containing a large number of medical publications. The text may include clinical free text, for example; and the clinical free text may include pediatric clinical free text. The text may include a plurality of documents and the method may further include the step of identifying a subset of the plurality of documents by identifying at least two documents having associations with at least one identical concept. In machine learning, data annotation is the process of labeling data to show the outcome you want your machine learning model to predict. Such as, labeling, tagging, transcribing, or processing - a dataset with the features you want your machine learning system to learn to recognize. Annotation is one of the most important processes in the generation of chatbot training datasets and other NLP training data. It is the act of locating, extracting, and tagging entities in text. Annotated data reveals feature that will train your algorithms to identify the same features in data that has not been annotated. Data annotation is used in supervised learning and hybrid, or semi-supervised, machine learning models that involve supervised learning. As tokens are the building blocks of Natural Language, the most common way of processing the raw text happens at the token level. Tokenization is the foremost step while modeling text data. Tokenization is performed on the corpus to obtain tokens. Corpus annotation is the practice of adding interpretative linguistic information to a corpus. For example, one common type of annotation is the addition of tags, or labels, indicating the word class to which words in a text belong. This is so-called part-of-speech tagging (or POS tagging), and can be useful, for example, in distinguishing words which have the same spelling, but different meanings or pronunciation. If a word in a text is spelt present, it may be a noun (= 'gift'), a verb (= 'give someone a present') or an adjective (= 'not absent'). The meanings of these same-looking words are very different, and also there is a difference of pronunciation, since the verb present has stress on the final syllable. Incorporating the teachings of Pestian into Ranki would produce a process clinical text for assignment of billing codes, analyze suicide notes or legal discovery materials, and for processing other collections of text. Further, embodiments of the invention may be used to more effectively search large databases, such as a database containing a large number of medical publications, as disclosed by Pestian, (see Abstract). 
Regarding dependent claim(s) 15, the combination of Ranki and Pestian discloses the automatic annotating system as in claim 14. However, Ranki does not appear to specifically disclose wherein the first dataset comprises a plurality of electronic documents relating to a plurality of patients and wherein the data derived from the database contains protected health information. 
In the same field of endeavor, Pestian discloses wherein the first dataset comprises a plurality of electronic documents relating to a plurality of patients and wherein the data derived from the database contains protected health information (Pestian discloses the method may include the step of providing an output including at least one of the concepts in the first set of concepts.  The text may include a plurality of documents and the method may further include the step of identifying a subset of the plurality of documents by identifying at least two documents having associations with at least one identical concept. The clinical documents from the following subspecialties: radiology, nephrology, pulmonary, behavioral medicine, psychiatry, rheumatology, pathology, cardiology, allergy and immunology, critical care, hematology/oncology, and human genetics. Suicide notes and recorded discussions with suicidal patients are artifacts of the patient's inimical thoughts (analyze suicide notes). For example, the two sentences provide two different POS tags for the token patient, (see Pestian: Para. 0025-0045, 0055-0071, 0100-0135, 0185-0190 and FIG. 8-9). This reads on the claim concepts of wherein the first dataset comprises a plurality of electronic documents relating to a plurality of patients. A Health Information Management professional may be queried by the system to curate the data and manually assign a billing code. The system may learn from the HIM professional and it may remember the correct coding result thereby expanding its knowledge base. Finally, the entire cycle is repeated for each new patient visit. The system learning from the HIM professional and remembers the correct coding result, thereby expanding its knowledge base. The entire cycle may be repeated for each new patient visit. A database containing a large number of medical publications. Personalized Medicine is the delivery of health care that is based upon an individual's specific genotype, current clinical state and environmental conditions, (see Pestian: Para. 0022-0040 and 0182-0190). This reads on the claim concepts of wherein the data derived from the database contains protected health information). 
   Regarding independent claim(s) 20, Rankin discloses an automatic annotating system comprising: a means for accessing a database, the database comprising a plurality of data items, each data item comprising one or more properties, each property of the one or more properties having an associated value (Rankin discloses these instructions may be operational and/or data instructions containing and/or referencing other instructions and data in various processor accessible and operable areas of memory 729 (e.g., registers, cache memory, random access memory, etc.). The memory 729 may contain a collection of program and/or database components and/or data and a stored program component that is executed by a CPU. The database component includes several tables: Industry table includes fields and a User table includes fields. For example, user_id, user_name, user_ employer, user_contact_address, industry_id, listing_id, industry id, industry name, data_field_id and data_field_ value. The person (or trainer) may edit 170 and 171 each attribute and extraction field. (plurality of data items, each data item comprising one or more properties, each property of the one or more properties having an associated value). The MLDA may identify the abstraction field values within previously identified paragraphs, or categorized paragraphs for enumerated field types (i.e., rent type, TI allowance, etc). The MLDA may identify in free-text a set of pre-defined entities of interest (i.e. listing attributes-address, broker information, etc.) and annotation of text with a list of pre-defined categories. The MLDA may use the training data with machine learning and generate a machine learning model, (see Ranki: Para. 0025-0040, 0058, 0111-0112, 0115-0120, 0133-0141 and 0145-0150). This reads on the claim concepts of an automatic annotating system comprising: a means for accessing a database, the database comprising a plurality of data items, each data item comprising one or more properties, each property of the one or more properties having an associated value); 
a means for accessing a first dataset comprising text, wherein a portion of the text contains data derived from the database (Ranki discloses a portion of the automatically annotated documents may set aside for human validation (based on ML confidence score, i.e. a threshold probability that the extracted information is correct). The MLDA database component 719 may be embodied in a database and its stored data. The database is a stored program component, which is executed by the CPU; the stored program component portion configuring the CPU to process the stored data. The database may be a conventional, fault tolerant, relational, scalable, secure database such as Oracle or Sybase. The MLDA database may be implemented using various standard data-structures, such as an array, hash, (linked) list, struct, structured text file (e.g., XML), table, and access to the MLDA database, (see Ranki: Para. 0048 and 0135-0148). This reads on the claim concepts of a means for accessing a first dataset comprising text, wherein a portion of the text contains data derived from the database); 
a means for segmenting the text of the first dataset into tokens, the tokens comprising one or more characters (Ranki discloses the ML algorithm may classify individual words (tokens) into one of several categories (e.g. lease size, broker email, listing street address, etc.). The kind of the preceding and following tokens (e.g. number, word (text), punctuation). The bag-of-words approach may consider only individual tokens (unigrams). Providing more contextual information (sequence of words). The MLDA may classify individual tokens (from a previously identified paragraph) into referring to the value of an abstraction field. A paragraph referring to Rent Type is then classified into one of the Rent Type categories (one or more characters). The MLDA database component 719 may be embodied in a database and its stored data. The database is a stored program component, which is executed by the CPU; the stored program component portion configuring the CPU to process the stored data, (see Ranki: Para. 0039-0045 and 0107-0111). This reads on the claim concepts of a means for segmenting the text of the first dataset into tokens, the tokens comprising one or more characters); 
a means for storing the annotations and associated database properties and database values in a memory as an annotated dataset (Ranki discloses The MLDA server may store 115 the initial data set and the initial rules to the MLDA database 109. MLDA controller may serve to aggregate, process, store, search, serve, identify, instruct, generate, match, and/or facilitate interactions with a computer through behavior assessment technologies, and/or other related data. The annotations may be used as training data for machine learning. The MLDA may train a machine learning algorithm to identify paragraphs relevant to an abstraction field. The MLDA may identify the abstraction field values within previously identified paragraphs, or categorized paragraphs for enumerated field types (i.e., rent type, TI allowance, etc). Process the data so that it is represented with structures and annotations. For example, a person may use the interface to manually validate the data annotated by the machine learning algorithm. The person (or trainer) may edit 170 and 171 each attribute and extraction field. The newly validated data may be fed into the machine learning program to update and improve the model. These instructions may be operational and/or data instructions containing and/or referencing other instructions and data in various processor accessible and operable areas of memory 729 (e.g., registers, cache memory, random access memory, etc.), (see Ranki: Para. 0035-0060, 0114 and 0162). This reads on the claim concepts of a means for storing the annotations and associated database properties and database values in a memory as an annotated dataset). 
However, Ranki does not appear to specifically disclose a means for identifying tokens in the first dataset that match property values in the database for predetermined database properties; a means for determining whether the identified tokens in the first dataset represent values associated with a property in the database; a means for annotating the identified tokens of the first dataset when the identified tokens are determined to represent values associated with a property in the database, wherein annotating comprises associating a tag with each identified token and assigning annotation attributes for each tag. 
In the same field of endeavor, Pestian discloses a means for identifying tokens in the first dataset that match property values in the database for predetermined database properties (Pestian discloses processing natural language may include the steps of providing a text, the text including a plurality of groups of characters; providing a database, the database including associations between a plurality of known words and a plurality of concepts, the database further including quantitative values. The identified known words; querying the database to obtain a first set of concepts associated with each of the identified known words; and annotating the list of identified known words with the first set of concepts associated with each identified known word.  For example, in the original text, the last token 210 (before the".") listed was changed from "die" to "discharge". Each token 210 has also been tagged. This supervised training set was then used to train the spreading activation software for identifying concepts in similar radiology transcriptions. Add words to a phrase that match mentioned POS tags until there is a phrase that is not in the UMLS and in additional concepts that matches the search concept. A weighting value (predetermined threshold) reflecting the strength of each relationship and these columns contain integer values indicating the document with which the new concept is associated, (see Pestian: Para. 0022-0050, 0060-0072, 0108-0124, 0131-0150 and 0190). This reads on the claim concepts of a means for identifying tokens in the first dataset that match property values in the database for predetermined database properties); 
a means for determining whether the identified tokens in the first dataset represent values associated with a property in the database (Pestian discloses that is, at iteration zero, concepts in the UMLS are identified in the clinical free-text if their weights meet the selected threshold. A first set of concepts associated with each of the identified known words and annotating the list of identified known words with the first set of concepts associated with each identified known word.  The POS tagger will be trained using the sample of tokens. The trained POS tagger will then be used to annotate the entire CUTC. A weighting value (predetermined threshold) reflecting the strength of each relationship and these columns contain integer values indicating the document with which the new concept is associated (quantitative value representative of the strength of the relationship between each concept). Determining a minimum threshold weighting value; and eliminating from the semantic network the relationships and the additional relationships that do not satisfy the minimum threshold weighting value. The method may include the steps of comparing the at least one concepts and the additional concepts to a list of known relevant concepts to generate a list of identified relevant concepts; and providing an output based on at least one of a number and a significance of the identified relevant concepts. The text including a plurality of groups of characters; providing a database, the database including associations between a plurality of known words and a plurality of concepts, (see Pestian: para. 0051, 0129-0138 and 0140-0150). This reads on the claim concepts of a means for determining whether the identified tokens in the first dataset represent values associated with a property in the database);
a means for annotating the identified tokens of the first dataset when the identified tokens are determined to represent values associated with a property in the database, wherein annotating comprises associating a tag with each identified token and assigning annotation attributes for each tag (Pestian discloses annotating the list of identified words with the second set of concepts by considering the quantitative value representative of the strength of the relationship between each concept in the second set of concepts and its associated concept in the first set of concepts. Both the Manchester tagger and the TreeTagger can be tuned to new text-types by training against corpora. Even very small-annotated corpora of the right text type can make a big difference to performance. A first set of concepts associated with each of the identified known words and annotating the list of identified known words with the first set of concepts associated with each identified known word.  The POS tagger will be trained using the sample of tokens. The trained POS tagger will then be used to annotate the entire CUTC. A weighting value (predetermined threshold) reflecting the strength of each relationship and these columns contain integer values indicating the document with which the new concept is associated (quantitative value representative of the strength of the relationship between each concept). Determining a minimum threshold weighting value; and eliminating from the semantic network the relationships and the additional relationships that do not satisfy the minimum threshold weighting value. The method may include the steps of comparing the at least one concepts and the additional concepts to a list of known relevant concepts to generate a list of identified relevant concepts; and providing an output based on at least one of a number and a significance of the identified relevant concepts. The text including a plurality of groups of characters; providing a database, the database including associations between a plurality of known words and a plurality of concepts. The part-of-speech tags and including UMLS concepts. The CUTC contains individual tokens 210 and a hand annotated POS 212. A tag-specific measure of agreement may be obtained to examine the extent to which the two processes tend to lead to consistent conclusions with respect to the particular tag, (see Pestian: para. 0051, 0097-0112, 0124-0138 and 0140-0150). This reads on the claim concepts of a means for annotating the identified tokens of the first dataset when the identified tokens are determined to represent values associated with a property in the database, wherein annotating comprises associating a tag with each identified token and assigning annotation attributes for each tag); and 
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the machine learning data annotation of Ranki in order to have incorporated identified tokens, as disclosed by Pestian, these are directed to natural language processing of free text using domain-specific spreading activation. Embodiments of the present invention ontologize free text using an algorithm based on neurocognitive theory by simulating human recognition, semantic, and episodic memory approaches. Embodiments of the invention may be used to process clinical text for assignment of billing codes, analyze suicide notes or legal discovery materials, and for processing other collections of text, for example. Further, embodiments of the invention may be used to more effectively search large databases, such as a database containing a large number of medical publications. The text may include clinical free text, for example; and the clinical free text may include pediatric clinical free text. The text may include a plurality of documents and the method may further include the step of identifying a subset of the plurality of documents by identifying at least two documents having associations with at least one identical concept. In machine learning, data annotation is the process of labeling data to show the outcome you want your machine learning model to predict. Such as, labeling, tagging, transcribing, or processing - a dataset with the features you want your machine learning system to learn to recognize. Annotation is one of the most important processes in the generation of chatbot training datasets and other NLP training data. It is the act of locating, extracting, and tagging entities in text. Annotated data reveals feature that will train your algorithms to identify the same features in data that has not been annotated. Data annotation is used in supervised learning and hybrid, or semi-supervised, machine learning models that involve supervised learning. As tokens are the building blocks of Natural Language, the most common way of processing the raw text happens at the token level. Tokenization is the foremost step while modeling text data. Tokenization is performed on the corpus to obtain tokens. Corpus annotation is the practice of adding interpretative linguistic information to a corpus. For example, one common type of annotation is the addition of tags, or labels, indicating the word class to which words in a text belong. This is so-called part-of-speech tagging (or POS tagging), and can be useful, for example, in distinguishing words which have the same spelling, but different meanings or pronunciation. If a word in a text is spelt present, it may be a noun (= 'gift'), a verb (= 'give someone a present') or an adjective (= 'not absent'). The meanings of these same-looking words are very different, and also there is a difference of pronunciation, since the verb present has stress on the final syllable. Incorporating the teachings of Pestian into Ranki would produce a process clinical text for assignment of billing codes, analyze suicide notes or legal discovery materials, and for processing other collections of text. Further, embodiments of the invention may be used to more effectively search large databases, such as a database containing a large number of medical publications, as disclosed by Pestian, (see Abstract).
Claims 3 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Rankin et al. (US 2014/0223284 A1, hereinafter Rankin) in view of Pestian (US 2015/0081280 A1, hereinafter Pestian) and in view of Ellis et al. (US 2014/0350954 A1, hereinafter Ellis).  
Regarding dependent claim(s) 3, the combination of Rankin and Pestian discloses the method as in claims 1. However, the combination of Rankin and Pestian does not appear to specifically disclose wherein the database and the first dataset are proprietary to an entity authorized under regulatory guidelines to possess the data in the database and the first dataset. 
In the same field of endeavor, Ellis discloses wherein the database and the first dataset are proprietary to an entity authorized under regulatory guidelines to possess the data in the database and the first dataset (Ellis discloses Personalized medicine aims to optimize the healthcare provided to individuals by basing decisions about their care on all available patient data. The goal of personalized medicine is to provide the right treatment at the right time for the right patient. Searching by a computer processor a database of electronic health records based on the corresponding clinical concept, identifying an electronic health record in the database associated with the corresponding clinical concept and outputting a result. The system requirements can include addition of semantic and taxonomic datasets or ontologies and the ability to set use-case-based thresholds for necessary redaction of private data. The invention provides for semantic interoperability through ontological mapping using a variety of open source and proprietary biomedical ontologies. A Curation & Redaction toolset of the invention allows authorized users to add notations and connections among their data elements manually in order to enhance the data and tailor information to the user's purpose. A Data Delivery toolset of the invention allows authorized users to define and receive datasets from the invention that can then be used more locally in client-specific tools and environments. The system requirements can include proactive pursuing of appropriate care for patients, population management profiling, patient/case management across organizations, benchmark care guidelines mapping & reporting, and support for accountable care requirements, (see Ellis: Para. 0025-0055 and 0060-0068). This reads on the claim concepts of wherein the database and the first dataset are proprietary to an entity authorized under regulatory guidelines to possess the data in the database and the first dataset). 
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the machine learning data annotation with identified tokens of Ranki and Pestian in order to have incorporated identified tokens, as disclosed by Ellis, these are directed to the goal of personalized medicine is to provide the right treatment at the right time for the right patient. Since variations in an individual's clinical, genetic or other molecular data can correlate with differences in how individuals develop diseases and respond to treatment, personalized medicine has the potential to improve outcomes and reduce cost through the tailoring of healthcare to the individual. As a result, digital healthcare data remains locked within disparate healthcare systems and is inaccessible to the types of tools and applications that have become widespread in other industries. This organization leads to an unrealized potential of digital healthcare data, impedes the ability of researchers to make discoveries and prevents practitioners from making informed decisions tailored to a specific patient's needs. The Authorize attribute enables you to restrict access to resources based on roles. For example, every resource has an owner. The owner can delete the resource; other users cannot. A corpus has been annotated in advance, this will help in many kinds of automatic processing or analysis. For example, corpora which have been POS-tagged can automatically yield frequency lists or frequency dictionaries with grammatical classification. Such listings will treat leaves (verb) and leaves (noun) as different words, to be listed and counted separately, as for most purposes they should be. The pure text, a corpus can also be provided with additional linguistic information, called 'annotation'. This information can be of different nature, such as prosodic, semantic or historical annotation. The most common form of annotated corpora is the grammatically tagged one. Incorporating the teachings of Ellis into Ranki and Pestian would produce the use of a medical software platform to collect, normalize, and aggregate clinical data. Supports clinical care and provides research and clinical tools for institutions, healthcare providers, researchers, and patients. The invention provides a graphical interface to customize searches in the database for specified subsets of conditions, treatments or outcomes. The invention also provides a system and method for searching through clinical databases for desired terms which may provide additional information to a physician regarding patient care. Furthermore, this invention facilitates personalized medicine-based practices as relationships between genetics, personal heath data from multiple sources, disease risk and drug response can be more easily visualized and utilized for patient care and research, as disclosed by Ellis, (see Abstract).
Regarding dependent claim(s) 6, the combination of Rankin and Pestian discloses the method as in claims 1. However, the combination of Rankin and Pestian does not appear to specifically disclose wherein the identifying of tokens in the first dataset comprises detecting tokens using a string searching algorithm. 
In the same field of endeavor, Ellis discloses wherein the identifying of tokens in the first dataset comprises detecting tokens using a string searching algorithm (Ellis discloses these latter identifiers, being centrally registered, can also require that the organization doing the assignment of such identifiers be included in the tokenization algorithm to maintain uniqueness of tokenized identifiers in the platform. The search terms are illustrated graphically rather written as a textual string, improving efficiency and availing the platform to novice users with minimal clinical or IT training. The search is then performed based on the query entered, without the need for the user to construct a complicated textual search string or procedural search logic or code, (see Ellis: Para. 0055, 0124 and 0171). This reads on the claim concepts of wherein the identifying of tokens in the first dataset comprises detecting tokens using a string searching algorithm).        
Claims 7-9, 11 and 16-19 are rejected under 35 U.S.C. 103 as being unpatentable over Rankin et al. (US 2014/0223284 A1, hereinafter Rankin) in view of Pestian (US 2015/0081280 A1, hereinafter Pestian) and in view of Grefenstette et al. (US 2010/0250547 A1, hereinafter Grefenstette).       
Regarding dependent claim(s) 7, the combination of Rankin and Pestian discloses the method as in claims 1. However, the combination of Rankin and Pestian does not appear to specifically disclose wherein the determining comprises: calculating, by the processor, a prior probability, for each identified token, of whether the identified token represents a value associated with a property in the database based on a prevalence of the identified token in a second dataset; iteratively calculating, by the processor, a posterior probability, for each identified token, of whether the identified token represents a value associated with a property in the database based on a Bayesian network, wherein the iterative calculating starts with observing a bottommost child node of the Bayesian network having the highest calculated prior probability and repeats for each layer of parent nodes of the child node of the Bayesian network; and determining, by the processor, whether a respective identified token represents a value associated with a property in the database based on the calculated posterior probability for an uppermost parent node representing the respective identified token.      
In the same field of endeavor, Grefenstette discloses wherein the determining comprises: calculating, by the processor, a prior probability, for each identified token, of whether the identified token represents a value associated with a property in the database based on a prevalence of the identified token in a second dataset (Grefenstette discloses a personality token is an electronic tag that includes a digitally readable identifier. The tag reader 506 in one embodiment is programmed to use context information (i.e., location and time information) to assign a personality identifier to documents and/or document tokens on the tag reader 506 by using document metadata (e.g., document title, creation date, author, etc.) and/or document content. A class conditional that are used in the calculation of the posterior probability for a class given a new document to be classified.  the meta-document server 200, the document content and personality identified by the personality identifier is used to create a meta-document. A personality token records an identifier to a personality in personality database 212 shown in FIG. 2. The weight value associated with each feature is calculated using any of a number of well-known techniques, varying from a normalized frequency count to a more sophisticated weighting scheme which is calculated based upon an aggregation of a number of measures such as the frequency of each term in the document (a set of class values). A meta-document with a name, a personality and other meta-values (e.g., access privileges etc.). This object may have associated with its various attribute descriptions that make up other fields in the database tuple. These are usually estimated from a labeled training dataset. A learning module consists of estimating the class conditional probabilities and the class probabilities from a training dataset Train (a labeled collection of documents) for each possible document classification Class, (see Grefenstette: Para. 0156-0168, 0170-0191, 0333-0340, 0342-0353 and 0422-0424). This reads on the claim concepts of wherein the determining comprises: calculating, by the processor, a prior probability, for each identified token, of whether the identified token represents a value associated with a property in the database based on a prevalence of the identified token in a second dataset); 
iteratively calculating, by the processor, a posterior probability, for each identified token, of whether the identified token represents a value associated with a property in the database based on a Bayesian network, wherein the iterative calculating starts with observing a bottommost child node of the Bayesian network having the highest calculated prior probability and repeats for each layer of parent nodes of the child node of the Bayesian network (Grefenstette discloses a class conditional that are used in the calculation of the posterior probability for a class given a new document to be classified.  the meta-document server 200, the document content and personality identified by the personality identifier is used to create a meta-document. A personality token records an identifier to a personality in personality database 212 shown in FIG. 2. The weight value associated with each feature is calculated using any of a number of well-known techniques, varying from a normalized frequency count to a more sophisticated weighting scheme which is calculated based upon an aggregation of a number of measures such as the frequency of each term in the document (a set of class values). A meta-document with a name, a personality and other meta-values (e.g., access privileges etc.). This object may have associated with its various attribute descriptions that make up other fields in the database tuple. These are usually estimated from a labeled training dataset. A learning module consists of estimating the class conditional probabilities and the class probabilities from a training dataset Train (a labeled collection of documents) for each possible document classification Class and type of probabilistic model, namely, a Naive Bayesian mode used here. The that iteratively corrects errors in meta-document 4202 using information space. All services related thereto in the hierarchy 3500 from the node at which it is classified up to the root 3502 and identified, and services associated with each node are applied to the selected document content. The parent node in a category organization is generally less descriptive than the child node. The root node defines the least descriptive category in the category organization. This set of documents is defined as a level N=O document set. At 1704, all links are extracted from the level N document set. At 1706, content pointed to by the extracted links is fetched and used to define a level N+1 document set. At 1708, if additional levels are to be descended then the action at 1704 is repeated; otherwise, an expanded document is defined using the N document sets defined at 1702 and 1706. (see Grefenstette: Para. 0156-0168, 0170-0191, 0215-0229, 0324-0330, 0333-0340, 0342-0353, 0382-0395, 0422-0424 and 0452-0457). This reads on the claim concepts of iteratively calculating, by the processor, a posterior probability, for each identified token, of whether the identified token represents a value associated with a property in the database based on a Bayesian network, wherein the iterative calculating starts with observing a bottommost child node of the Bayesian network having the highest calculated prior probability and repeats for each layer of parent nodes of the child node of the Bayesian network); and 
determining, by the processor, whether a respective identified token represents a value associated with a property in the database based on the calculated posterior probability for an uppermost parent node representing the respective identified token (Grefenstette discloses calculation of the posterior probabilities given evidence using Bayes' theorem simplifies. Corresponding posterior probability is the maximum amongst all posterior probabilities. this decision-making strategy, the denominator in the simplified inference equation is common to all posterior probabilities, it can be dropped from the inference process. This further simplifies the reasoning process (and the representation also). In each class is represented by a series of word conditional probabilities for each word and a class conditional that are used in the calculation of the posterior probability for a class given a new document to be classified. Naive Bayes classifiers can quite easily be learned from example data. These labels identify nodes 3910, 3912, and 3914 of the top-level nodes 3904 (different levels of the entity type hierarchy). A personality token is an electronic tag that includes a digitally readable identifier. The tag reader 506 in one embodiment is programmed to use context information (i.e., location and time information) to assign a personality identifier to documents and/or document tokens on the tag reader 506 by using document metadata (e.g., document title, creation date, author, etc.) and/or document content. A personality token records an identifier to a personality in personality database 212 shown in FIG. 2. The weight value associated with each feature is calculated using any of a number of well-known techniques, varying from a normalized frequency count to a more sophisticated weighting scheme which is calculated based upon an aggregation of a number of measures such as the frequency of each term in the document. This set of documents is defined as a level N=O document set. At 1704, all links are extracted from the level N document set. At 1706, content pointed to by the extracted links is fetched and used to define a level N+1 document set. At 1708, if additional levels are to be descended then the action at 1704 is repeated; otherwise, an expanded document is defined using the N document sets defined at 1702 and 1706, (see Grefenstette: Para. 0156-0168, 0170-0191, 0215-0229, 0324-0330, 0333-0340, 0342-0353, 0382-0395, 0422-0424 and 0452-0457). This reads on the claim concepts of determining, by the processor, whether a respective identified token represents a value associated with a property in the database based on the calculated posterior probability for an uppermost parent node representing the respective identified token).
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the machine learning data annotation with identified tokens of Ranki and Pestian in order to have incorporated Bayesian network, as disclosed by Grefenstette, these are directed to knowledge management through document management forms an important part of the knowledge creation and sharing lifecycle. A typical model of knowledge creation and sharing is cyclical, consisting of three main steps: synthesizing (search, gather, acquire and assimilate), sharing (present, publish/distribute), and servicing (facilitate document use for decision making, innovative creativity). A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Most systems consider documents as static objects that only acquire new content when acted upon by an authorized user. A user's decision to read and modify a document, or to run a program on it which may change its contents is needed for the document to acquire new information. A directed acyclic graph is a conceptual representation of a series of activities. The order of the activities is depicted by a graph, which is visually presented as a set of circles, each one representing an activity, some of which are connected by lines, which represent the flow from one activity to another. Each node in a directed acyclic graph represents a random variable. These variables may be discrete or continuous valued. These variables may correspond to the actual attribute given in the data. It provides a graphical model of causal relationship on which learning can be performed. Bayesian networks are probabilistic, because these networks are built from a probability distribution, and also use probability theory for prediction and anomaly detection. Incorporating the teachings of Grefenstette into Ranki and Pestian would produce a method, system and article of manufacture therefor, are disclosed for automatically generating a query from document content, as disclosed by Grefenstette, (see Abstract). 
Regarding dependent claim(s) 8, the combination of Rankin and Pestian discloses the method as in claims 7. However, the combination of Rankin and Pestian does not appear to specifically disclose wherein iteratively calculating comprises refining the calculated prior probability based on observing nodes for each layer of parent nodes of the Bayesian network and filtering refined prior probabilities based on predetermined probability thresholds. 
In the same field of endeavor, Grefenstette discloses wherein iteratively calculating comprises refining the calculated prior probability based on observing nodes for each layer of parent nodes of the Bayesian network and filtering refined prior probabilities based on predetermined probability thresholds (Grefenstette discloses the root node has not already been searched using the defined query, then the node in the category organization at which the category is defined is changed at 4014 to its parent node. The parent node in a category organization is generally less descriptive than the child node. The root node defines the least descriptive category in the category organization. All parent nodes of the entity type with the recognized entity are identified, and services associated with each node are applied to the selected document content. For example, in one instance the generic personality is applied only if the expanded document references less than a predetermined threshold number of documents.  Filtering at 3008 involves identifying the overall frequency of entities in the accessed document. Those entities with the lowest frequency pass through the filter. The approximate reasoning module 3618, which contains matching, filtering and decision-making mechanisms, accesses the knowledge base 3622 to classify the unlabelled text object 3612. the query can be refined by filtering and/or ranking the results returned by the query mechanism using the classification labels or its associated characteristic vocabulary in a number of ways. type of probabilistic model, namely, a Naive Bayesian model, first it is described below how to represent models and perform inference approximate reasoning in such a framework, then it is described below how to learn Naive Bayes models from labeled example documents, (see Grefenstette: Para. 0215-0229, 0298 and 0324-0393). This reads on the claim concepts of wherein iteratively calculating comprises refining the calculated prior probability based on observing nodes for each layer of parent nodes of the Bayesian network and filtering refined prior probabilities based on predetermined probability thresholds). 
Regarding dependent claim(s) 9, the combination of Rankin and Pestian discloses the method as in claims 8. However, the combination of Rankin and Pestian does not appear to specifically disclose wherein observing nodes of the Bayesian network comprises maximizing the probability of a state on a Bayesian network node. 
In the same field of endeavor, Grefenstette discloses wherein observing nodes of the Bayesian network comprises maximizing the probability of a state on a Bayesian network node (Grefenstette discloses the documents are assigned class labels (or assigned to nodes in a labeled hierarchy), a classification profile is derived that allows document content to be assigned to an existing label or to an existing class, by measuring the similarity between the new document and the known class profiles. A set of nodes and/or sub-nodes in a document categorization structure (e.g., hierarchy, graphs). The naive Bayesian framework a simplifying assumption is introduced, sometimes known as the naive assumption, where the input variables (in this case the terms) are assumed to be conditionally independent given the target classification value. Each class is represented by a series of word conditional probabilities for each word and a class conditional that are used in the calculation of the posterior probability for a class given a new document to be classified. The determination to filter entities at 3006 can be made for example using a maximum threshold number to limit redundant, superfluous, or surplus information (corresponding posterior probability is the maximum amongst all posterior probabilities), (see Grefenstette: Para. 0315-0379 and 0380-0396). This reads on the claim concepts of wherein observing nodes of the Bayesian network comprises maximizing the probability of a state on a Bayesian network node). 
Regarding dependent claim(s) 11, the combination of Rankin, Pestian and Grefenstette discloses the method as in claims 7. However, the combination of Rankin and Grefenstette do not appear to specifically disclose wherein the first dataset and the second dataset are mutually exclusive.
      In the same field of endeavor, Pestian discloses wherein the first dataset and the second dataset are mutually exclusive (Pestian discloses the nodes leads to inhibition of nodes representing mutually exclusive interpretations, while activation constraints use a threshold function at a single node level to control spreading activation. Weights between concepts resulting from mutually exclusive interpretations of phrases or acronyms should be negative, leading to inhibition of some concepts. two active nodes may strengthen their activity, thus mutually activating each other (a relationship between a first one of the concepts and a second one of the concepts., (see Pestian: Para. 0078 and 0161-0163). This reads on the claim concepts of wherein the first dataset and the second dataset are mutually exclusive). 
Regarding claim 16, (drawn system): claim 16 is system claims respectively that correspond to method of claim 7. Therefore, 16 is rejected for at least the same reasons as the method of 7. 
Regarding claim 17, (drawn system): claim 17 is system claims respectively that correspond to method of claim 8. Therefore, 17 is rejected for at least the same reasons as the method of 8.
Regarding claim 18, (drawn system): claim 18 is system claims respectively that correspond to method of claim 9. Therefore, 18 is rejected for at least the same reasons as the method of 9. 
Regarding claim 19, (drawn system): claim 19 is system claims respectively that correspond to method of claim 11. Therefore, 19 is rejected for at least the same reasons as the method of 11. 
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Rankin et al. (US 2014/0223284 A1, hereinafter Rankin) in view of Pestian (US 2015/0081280 A1, hereinafter Pestian) and in view of Bagheri et al. (US 2016/0110471 A1, hereinafter Bagheri).  
Regarding dependent claim(s) 10, the combination of Rankin and Pestian discloses the method as in claims 1. However, the combination of Rankin and Pestian does not appear to specifically disclose wherein the annotation attributes include identification of database data items, database properties, database property values, a probability that the identified tokens represent values associated with a property in the database, a determination of whether the identified tokens represent values associated with a property in the database, character span information for characters of the identified tokens, or combinations thereof.
In the same field of endeavor, Bagheri discloses wherein the annotation attributes include identification of database data items, database properties, database property values, a probability that the identified tokens represent values associated with a property in the database, a determination of whether the identified tokens represent values associated with a property in the database, character span information for characters of the identified tokens, or combinations thereof (Bagheri discloses the Web is a system of interlinked documents that are accessed using a medium such as the Internet Search engines are generally capable of mapping a term to the location of a web document by searching in documents. However, hidden underneath each web document, lays real world objects (i.e. products, locations, etc.) and identify and analyze text associated with the object, extract property and value points and annotations from the text (extracted text property and value points and annotations). Each of these pipelines produce attributes or features identified within the scraped webpage. For example, the HTML attributes and features and the IMAGES attributes and features to a nearest neighbor; determing the closest match for each of via agglomerative clustering to determine the closest match between the content in the scraped webpage and the objects in the database (herein referred to interchangeably as the "inextweb database"). This system is capable of searching for objects not only by describing, but also using visualization tools such as taking a photo of an item or detection of items in a video (data search platform and database). The properties of these database objects are then assumed to be potential properties to be found within the scraped webpage (respective values out of text). A probability (0-1) of the confidence of the concept. Each partition contains an ordered list of unique tokens. Matching tokens from are identified and the ordinal position of the matched token is recorded. Using tokenization and part-of-speech tagging, each token is grammatically identified which are used to perform the initial search for similar concepts from a structured ontology via a bag-of-words simple match. For example, the following two objects share some characteristics therefore they belong to a sub class of similar properties. A minimal (or common) spanning set of <subject, predicate, object> ontology triples that best covers the discovered properties is computed along with a probability (or confidence), (see Bagheri: Para. 0050-0070, 0091-0100, 0115-0118, 0120-0130 and 0183-0190). This reads on the claim concepts of wherein the annotation attributes include identification of database data items, database properties, database property values, a probability that the identified tokens represent values associated with a property in the database, a determination of whether the identified tokens represent values associated with a property in the database, character span information for characters of the identified tokens, or combinations thereof).       
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the machine learning data annotation with identified tokens of Ranki and Pestian in order to have incorporated span information for characters, as disclosed by Bagheri, these are directed to the web was designed to cater humans needs in a way that each human wanting information from a specific part of the web would have to personally navigate through the web either using search or other methods, find it and use it in a way that the makers the document decided. Web designing, navigation, search engine optimization became important for website owners only because they were directly talking to humans with minimal personalization. The texts in these documents are in a language that humans understand, not computer bots or agents. Also, images and video are designed specifically for humans. The devices continue to become less expensive, more and more powerful, and as capacity of data storage devices continues to rapidly increase, more and more data is being generated and stored, oftentimes as structured or semi-structured datasets. A dataset a collection of data that conforms to either a formal schema (in the case of conventional relational databases), or to an informal conceptual model of the contents (in the case of NoSQL databases, including loose-schemata, semi-formal-schemata, and schema-free conceptual models), wherein the formal schema and/or conceptual model is conventionally defined by the producer or maintainer of the dataset. As used herein, the term "schema" is intended to encompass both a formal schema as well as an informal conceptual model of contents of a dataset. As will be understood by one skilled in the art of dataset generation/maintenance, a schema defines the structure and content of the dataset. The <span> HTML element is a generic inline container for phrasing content, which does not inherently represent anything. It can be used to group elements for styling purposes (using the class or id attributes), or because they share attribute values. Tag is an inline container used to mark up a part of a text, or a part of a document. Many technologies on top of the Internet, such as World Wide Web (web) and Electronic Mail (e-mail) were born to allow humans to share information and communicate. Incorporating the teachings of Bagheri into Ranki and Pestian would produce A computer implemented method and system enables use of a database of machine-readable properties, features and traceable locations of real objects to search and locate and/or identify objects on the web by human input to a machine of image and/or oral cues relating to the object, as disclosed by Bagheri, (see Abstract). 
                                                             Examiner's Notes
Examiner cites particular columns and line numbers in the references as applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in its entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner and the additional related prior arts made of record that are considered pertinent to applicant's disclosure to further show the general state of the art.  
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YOHANES Demiss KELEMEWORK whose telephone number is (571)272-8772. The examiner can normally be reached Monday-Friday 8:00 am-5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ashish Thomas can be reached on 571-272-0631. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YOHANES D KELEMEWORK/               Examiner, Art Unit 2164   

/ASHISH THOMAS/               Supervisory Patent Examiner, Art Unit 2164