DETAILED ACTION

	This communication is in response to the Applicant Arguments/Remarks filed 6/17/2022. Claims 1-20 are pending in the application.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Response to Arguments
Applicant's arguments filed 6/17/2022 have been fully considered but they are not persuasive. Regarding the arguments on pages 10-12 in relating to the amended limitations “predicting a report type associated with the natural language query and indicating the data lineage, wherein the report type comprises the at least one or more data objects and one or more corresponding data fields that are linked to the natural language query; generating a plurality of candidate queries from the natural language query based at least in part on the machine learning model, the data lineage, and the report type”, please see the newly added paragraphs below.
Colley does teach natural language processing – See para. 1533; para. 1648-1652: data lineage, e.g., cancer type queries. After processing queries for each node in the primary diagnosis valueset, there may exist many descendants that point to multiple parents. A concept candidate may be explored by more than one query relating to the concept. For example, a concept candidate may be explored/followed until a concept with a related structure; para. 1677: when "Tylenol" is recognized as a medication, medication-specific queries may be processed to identify normalization candidates, for example, in the Abstraction Engine toolbox; para. 1807: a feature collection associated with the variant characterization machine learning models, classification models, or other artificial intelligence derived features; fig. 136: candidate queries are displayed on the interface – See para. 3145: automatically train and generate many other similar phrases that may be associated with the intent. This automatic training process by which a large number of similar queries /candidate queries are generated and associated with a specific intent; para. 415: capturing voice signals generated by an oncologist. An automated speech recognition (ASR) system converts the voice signals to a text file which is then processed by a natural language processor (NLP) etc.; 
para. 425: steps of associating separate sets of state-specific intents and supporting information with different clinical report types, the supporting information including at least one intent-specific data operation for each state-specific intent receiving a voice query via the microphone seeking information, identifying a specific patient associated with the query, identifying a state-specific clinical report associated with the identified patient, attempting to select one of the state-specific intents associated with the identified state-specific clinical report as a match for the query. Thus, Colley does teach the argued limitations.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Colley (US 20210090694).
As per claims 1, 13, 20, Colley et al. teaches 
a method for natural language query processing, comprising: 2training a machine learning model on a set of reports generated by a tenant, 3wherein each report of the set of reports comprises a title and a query for one or more data 4objects associated with the tenant (para. 221, 315: generating machine learning models that are refined and optimized through the use of continuous training data; para. 383: natural language processing; para. 1066: partners provide secure clinical files via a file transfer to the single tenant cloud platform and are stored as unstructured and identified files in the lake data base. Those files are abstracted and shaped as described above to generate normalized structured clinical data that is stored in a single tenant data vault as well as in a multitenant data vault; para. 1366: the group tile can include a title "Ovarian." The query list can include multiple phrases for searching. In some embodiments, phrases can include "ovarian cancer," "fallopian tube cancer," and variations thereon; para. 1477, 1807: a feature collection associated with the variant characterization machine learning models, classification models, or other artificial intelligence derived features); 
5identifying a data lineage for a data set associated with the tenant, wherein the 6data set is stored across a plurality of data sources and comprises at least the one or more data 7objects (para. 303: a directed acyclic graph (DAG) representation may be generated that includes a set of item icons or DAG vertices representing order items where the vertices are linked together by process flow lines or edges to indicate when one item is dependent on others; para. 956: data is received at a system server 150 from many different data sources; para. 1163: lineage-specific markers; para. 2246, 2353: as a result of the application of the plurality of cell-type profiles to the genetic target data received from the patient having the first tumor type, percentage in the sample of the cell-type profile for the cell type of the second tumor type may be determined; para. 3321: the ATC classification system is a strict hierarchy, meaning that each code necessarily has one and only one parent code, except for the 14 codes at the topmost level which have no parents. The codes are semantic identifiers, meaning they depict in themselves the complete lineage of parenthood); 
8receiving a natural language query associated with the data set; predicting a report type associated with the natural language query and indicating the data lineage, wherein the report type comprises the one or more data objects and one or more corresponding data fields that are linked to the natural language query (para. 123, 415: capturing voice signals generated by an oncologist. An automated speech recognition (ASR) system converts the voice signals to a text file which is then processed by a natural language processor (NLP) etc.; para. 425: steps of associating separate sets of state-specific intents and supporting information with different clinical report types, the supporting information including at least one intent-specific data operation for each state-specific intent receiving a voice query via the microphone seeking information, identifying a specific patient associated with the query, identifying a state-specific clinical report associated with the identified patient, attempting to select one of the state-specific intents associated with the identified state-specific clinical report as a match for the query, upon selection of one of the state-specific intents, performing the at least one data operation associated with the selected state-specific intent to generate a result; para. 428, 1192-1193: specific physicians or institutions (e.g., medical facilities at which physicians work) have preferences for test types, test sub-processes, report types, report formats, etc.; para. 1525, 1533: natural language processing, document type, report type, field types; figs. 168-171, 187: tailored predictions using best trained models to interact with users; para. 3233-3235: a directly responsive phrase to a query may be "There are two clinical trials that may be of interest to Dwayne Holder" and a supplemental response may be "The first clinical trial is 23 miles from your office and the second trial is 35 miles from your office", at least some databases will include specialized clinical reports or other report types that are developed for specific purposes where data is gleaned from EMRs and other system databases and used to instantiate specific instances of the reports for specific patients and cancer states);
9generating a plurality of candidate queries from the natural language query 10based at least in part on the machine learning model, the data lineage, and the report type; and 11selecting one or more candidate queries of the plurality of candidate queries for display based at least in part 12on a ranking of the plurality of candidate queries (para. 415, 1081: reports screen shows the reports icon highlighted to help orient the physician and includes a report list indicating all reports stored in the system that are associated with the patient, report information including report type, date etc.; para. 1533: natural language processing, document type, report type, field types; para. 1648-1652: data lineage, e.g., cancer type queries. After processing queries for each node in the primary diagnosis valueset, there may exist many descendants that point to multiple parents. A concept candidate may be explored by more than one query relating to the concept. For example, a concept candidate may be explored/followed until a concept with a related structure; para. 1677: when "Tylenol" is recognized as a medication, medication-specific queries may be processed to identify normalization candidates, for example, in the Abstraction Engine toolbox; para. 1807: a feature collection associated with the variant characterization machine learning models, classification models, or other artificial intelligence derived features; fig. 136: candidate queries are displayed on the interface – See para. 3145: automatically train and generate many other similar phrases that may be associated with the intent. This automatic training process by which a large number of similar queries /candidate queries are generated and associated with a specific intent.  

As per claims 12, 14, Colley teaches 
wherein identifying the data lineage further 2comprises: 3generating a semantics graph based at least in part on the data set associated 4with the tenant, wherein the semantics graph comprises a set of vertices corresponding to the 5plurality of data sources, and wherein the semantics graph represents associations of the data 6set across the plurality of data sources (para. 303: a directed acyclic graph (DAG) representation may be generated that includes a set of item icons or DAG vertices representing order items where the vertices are linked together by process flow lines or edges to indicate when one item is dependent on others; para. 956: data is received at a system server 150 from many different data sources; para. 1572: applying phrase-based or syntax-based machine translation approaches. For sentences which are well-structured (such as following traditional grammar and prose), parse trees or deep semantic representations may be utilized; para. 3321: the ATC classification system is a strict hierarchy, meaning that each code necessarily has one and only one parent code, except for the 14 codes at the topmost level which have no parents. The codes are semantic identifiers, meaning they depict in themselves the complete lineage of parenthood).  

As per claims 13-4, 15-16, Colley teaches
2displaying, on a user interface, a primary candidate query and one or more 3secondary candidate queries of the plurality of candidate queries, wherein the primary 4candidate query comprises a higher ranking than the one or more secondary candidate queries; receiving, via the user interface, an indication that a secondary candidate 3query from the one or more secondary candidate queries corresponds to the natural language 4query instead of the primary candidate query; and 5updating the machine learning model based at least in part on the received indication (para. 1357, 1366: the group tile can include a title "Ovarian." The query list can include multiple phrases for searching. In some embodiments, phrases can include "ovarian cancer," "fallopian tube cancer," and variations thereon; para. 3145: automatically train and generate many other similar phrases that may be associated with the intent. This automatic training process by which a large number of similar queries are generated and associated with a specific intent; para. 3190: generate and broadcast or present (e.g., visually on a display) queries to the oncologist to fill out the required information at an appropriate time; para. 1620: if concept candidates are competing for the same field, the concept candidate may be coupled with a reliability index based upon the frequency of the concept candidate occurring in relationship to the others. The highest ranked competing concept candidate may be preserved along with a reliability index; para. 1675, 1680, 315: update models; para. 1677: when "Tylenol" is recognized as a medication, medication-specific queries may be processed to identify normalization candidates, for example, in the Abstraction Engine toolbox; para. 1807: a feature collection associated with the variant characterization machine learning models, classification models, or other artificial intelligence derived features; fig. 136: candidate queries are displayed on the interface – See para. 3145: automatically train and generate many other similar phrases that may be associated with the intent. This automatic training process by which a large number of similar queries /candidate queries are generated and associated with a specific intent).  

As per claims 15, 17, Colley teaches
2receiving, via the user interface, an indication of a revision to the primary 3candidate query; and 4updating the machine learning model based at least in part on the received 5indication (fig. 409; para. 1089: selecting an alteration may take the physician to an additional view, shown at FIGS. 18a and 18b (showing different scrolled sections of one view in the two figures), where the physician can delve deeper into the alteration's effect, with supporting data visualizations; para. 1675: the system also may check the database to determine whether improved NLP models have been provided and retrieve any new or updated models; para. 1680: evaluating and updating the NLP and MLA models).  

As per claim 16, Colley teaches
2identifying a set of data objects based at least in part on the natural language 3query, wherein the set of data objects are stored in a first data source of the plurality of data 4sources and associated with a second data source of the plurality of data sources based at least 5in part on the data lineage, and wherein the plurality of candidate queries are generated based 6at least in part on querying the second data source (para. 303: a directed acyclic graph (DAG) representation may be generated that includes a set of item icons or DAG vertices representing order items where the vertices are linked together by process flow lines or edges to indicate when one item is dependent on others; para. 956: data is received at a system server 150 from many different data sources; para. 1366: the group tile can include a title "Ovarian." The query list can include multiple phrases for searching. In some embodiments, phrases can include "ovarian cancer," "fallopian tube cancer," and variations thereon; para. 1572: applying phrase-based or syntax-based machine translation approaches. For sentences which are well-structured (such as following traditional grammar and prose), parse trees or deep semantic representations may be utilized; para. 3321: the ATC classification system is a strict hierarchy, meaning that each code necessarily has one and only one parent code, except for the 14 codes at the topmost level which have no parents. The codes are semantic identifiers, meaning they depict in themselves the complete lineage of parenthood).  

As per claims 77, 18, Colley teaches
8parsing the natural language query with a per-character granularity during a 9first iteration of a plurality of iterations to generate a first candidate query of the plurality of 10candidate queries; and 11parsing the natural language query with a character group granularity during 12subsequent iterations of the plurality of iterations to generate additional candidate queries of 13the plurality of candidate queries (para. 388, 983, 1507: each field of the extracted region may have a plurality of enumerated values, or, if an enumerated list of values is unavailable may be limited to a certain type of value. For example, if the field relates to patient diagnosis, it may have a corresponding numerated list of all diagnoses that may be provided in the report. If the field relates to a treatment, it may have all known treatments and further parse the field to identify and enumerate unknown treatments; para. 1572: applying phrase-based or syntax-based machine translation approaches. For sentences which are well-structured (such as following traditional grammar and prose), parse trees or deep semantic representations may be utilized; para. 1613, 1687: models can be trained multiple times and improve over many iterations of new data).  

As per claims 18, 19, Colley teaches
wherein: 2identifying labels for one or more characters, character groups, or both, based 3at least in part on parsing the natural language query, wherein the labels comprise one or 4more data object fields, operations, directions, or a combination thereof (para. 1366: the group tile can include a title/label "Ovarian." The query list can include multiple phrases for searching. In some embodiments, phrases can include "ovarian cancer," "fallopian tube cancer," and variations thereon; para. 1477, 1807: a feature collection associated with the variant characterization machine learning models, classification models, or other artificial intelligence derived features; para. 1572: applying phrase-based or syntax-based machine translation approaches. For sentences which are well-structured (such as following traditional grammar and prose), parse trees or deep semantic representations may be utilized; para. 3321: the ATC classification system is a strict hierarchy, meaning that each code necessarily has one and only one parent code, except for the 14 codes at the topmost level which have no parents. The codes are semantic identifiers, meaning they depict in themselves the complete lineage of parenthood).  
As per claim 19, Colley teaches
wherein the labels are identified based at least 2in part on an estimation of a misspelling in the one or more characters or one or more 3character groups (para. 1557: the above fields, both the plurality of features and their respective feature classifications may be populated by a data analyst with sufficient medical knowledge and access to the requisite databases. Such an analyst may apply their education and experiences in the field of medicine to identify any medications administered despite confounding factors present in the text (such as shorthand, typos/misspelling, obscure references), their dosage, and understand the integration of the two in the provided text; para. 1623: correct potential typographical errors or OCR errors that occur).  

As per claim 110, Colley teaches
wherein the machine learning model is a deep 2learning model (para. 317: evaluating and updating the NLP and MLA models; para. 1824: deep learning model for predicting microsatellite instability (MSI) directly from medical images, including routine histopathology images.)  

As per claim 11, Colley teaches
Attorney Docket No. P154.01 (93056.0258)Salesforce Ref. No. A4611US412identifying a default machine learning model trained on a default set of 3reports, wherein the plurality of candidate queries are generated based at least in part on the 4default machine learning model (para. 1192: in a case where no preferences exist, the system will implement a set of default preferences for any received order or may have a feedback mechanism whereby any required preference is sought from an ordering physician or institution; para. 1336-1337; fig. 117; para. 1344, 3190: where the system fails to capture required parameters, the processor may generate and broadcast or present (e.g., visually on a display) queries to the oncologist to fill out the required information at an appropriate time).  

As per claim 112, Colley teaches
wherein each report of the set of reports 2comprises the one or more data objects and relationship between the one or more data 3objects (para. 968, 1081: reports screen 1600 shows the reports icon 1510 highlighted to help orient the physician and includes a report list indicating all reports stored in the system that are associated with the patient; para. 1487-1488).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Siebel (US 20170006135) teaches in fig. 1: concept map; para. 460: report writing tool with interactive analysis, different types of reports including enterprise reports, ad-hoc reports etc.; para. 511: predictive analytics. Amarasingham et al. (US 20160019666) teaches at p. 19: reports, lab results, healthcare sources; para 33: natural language, semantic machine learning; para. 41: update model; para. 60: multi-tenants; Debow (US 20140081714); Govindaraman (US 20140081715) teaches at para. 36-38: machine learning system, displaying reports; ranking measures etc.
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LINH BLACK whose telephone number is (571)272-4106. The examiner can normally be reached 9AM-5PM EST M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tony Mahmoudi can be reached on 571-272-4078. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/LINH BLACK/Examiner, Art Unit 2163                                                                                                                                                                                                        9/3/2022


/TONY MAHMOUDI/Supervisory Patent Examiner, Art Unit 2163