DETAILED ACTION
Introduction
This office action is in response to Applicant’s submission filed on October 2, 2020. 
Claims 1-20 are pending in the application. As such, claims 1-20 have been examined. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings were received on October 2, 2020.  These drawings have been accepted and considered by the Examiner.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1 and 17-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Klassen et al. (US Patent Pub. No. 2022/0068449), hereinafter Klassen.

Regarding claim 1, Klassen teaches 
a method, in a data processing system specifically configured to implement a fine-grained finding descriptor generation computing tool that automatically generates fine-grained labels as a basis for downstream computer system operations (Klassen [0008] In some non-limiting illustrative embodiments disclosed herein, a method is performed in conjunction with a Pathology Information System (PIS) which stores pathology reports in a pathology report format, and a Radiology Information System (RIS) which stores medical imaging examination reports in a medical imaging examination report format that is different from the pathology report format. The method comprises, using an electronic processor programmed by instructions stored on a non-transitory storage medium: converting at least one pathology report and at least one medical imaging examination report to an integrated diagnostics representation which represents the text of the converted reports as vocabulary category values of a vocabulary of categories; temporally ordering the converted reports based on timestamps of the respective reports; identifying a responsive report and a causational report based on vocabulary category values of the converted responsive report being responsive to vocabulary category values of the converted causational report; and displaying, on a workstation, a summary of the vocabulary category values used in the identifying), 
the method comprising: 
processing, by the fine-grained finding descriptor generation computing tool, 
medical report natural language content of at least one medical imaging report data structure associated with at least one medical image (Klassen [0040] The report tracking task 72 provides a correlation and concordance system between radiology and pathology (or more generally, between reports on two different medical information systems), and leverages both rule-based and statistical machine learning (ML) natural language processing (NLP) components such as the illustrative medical report transform 62 to correlate radiology and pathology reports and evaluate if their sentiment is concordant. The vocabulary of categories 64 enables understanding of important information in the radiology reports, including observations of the physical state of the patient based on radiographic imaging, descriptions of possible diagnosis or interpretations of observations made by the radiologist, and follow-up recommendations for subsequent tests or exams. Properties of these elements are discerned, such as anatomical regions, measurements of observed phenomena, and the vocabulary describing disease or cancer using NLP techniques. The extracted properties correlate to findings, observations, and diagnosis of biopsy procedures described in one or more subsequent and related pathology reports), 
based on a core finding lexicon data structure (Klassen [0040] The report tracking task 72 provides a correlation and concordance system between radiology and pathology (or more generally, between reports on two different medical information systems), and leverages both rule-based and statistical machine learning (ML) natural language processing (NLP) components such as the illustrative medical report transform 62 to correlate radiology and pathology reports and evaluate if their sentiment is concordant. The vocabulary of categories 64 enables understanding of important information in the radiology reports, including observations of the physical state of the patient based on radiographic imaging, descriptions of possible diagnosis or interpretations of observations made by the radiologist, and follow-up recommendations for subsequent tests or exams. Properties of these elements are discerned, such as anatomical regions, measurements of observed phenomena, and the vocabulary describing disease or cancer using NLP techniques. The extracted properties correlate to findings, observations, and diagnosis of biopsy procedures described in one or more subsequent and related pathology reports),
to extract a set of core finding instances of one or more core findings in the core finding lexicon data structure (Klassen [0021] In systems and methods disclosed herein, the existing medical IT infrastructure paradigm of employing separate systems for different medical areas (e.g. PACS, RIS, PIS, CVIS, EHR . . . ) is retained; but these systems are augmented by an integrated diagnostics system that provides automated and controlled cross-fertilization of data between the different systems. This is achieved by the use of an integrated diagnostics representation (IDR) which represents medical reports and other text-based medical data using a standardized vocabulary of categories. Thus, for example, a radiologist continues to produce radiology reports in the RIS environment using a radiology-specific reporting format, the pathologist continues to product pathology reports in the PIS environment using a pathology-specific (or even finer-grained pathology lab-specific) reporting format, and so forth; but, an integrated diagnostics representation extractor (IDRx) converts each of these documents to a representation in which key concepts (represented by categories of the vocabulary of categories) are extracted out to form IDR representations of the documents. In this way, for example, a recommendation to perform a pathology test contained in a radiology report is easily linked with a corresponding pathology report that summarizes the results of the recommended pathology test), 
from the medical report natural language content (Klassen [0021] In systems and methods disclosed herein, the existing medical IT infrastructure paradigm of employing separate systems for different medical areas (e.g. PACS, RIS, PIS, CVIS, EHR . . . ) is retained; but these systems are augmented by an integrated diagnostics system that provides automated and controlled cross-fertilization of data between the different systems. This is achieved by the use of an integrated diagnostics representation (IDR) which represents medical reports and other text-based medical data using a standardized vocabulary of categories. Thus, for example, a radiologist continues to produce radiology reports in the RIS environment using a radiology-specific reporting format, the pathologist continues to product pathology reports in the PIS environment using a pathology-specific (or even finer-grained pathology lab-specific) reporting format, and so forth; but, an integrated diagnostics representation extractor (IDRx) converts each of these documents to a representation in which key concepts (represented by categories of the vocabulary of categories) are extracted out to form IDR representations of the documents. In this way, for example, a recommendation to perform a pathology test contained in a radiology report is easily linked with a corresponding pathology report that summarizes the results of the recommended pathology test),
wherein the one or more core findings are terms describing one of anatomical structures or abnormalities present in the at least one medical image (Klassen [0043] In one suitable embodiment, the data extraction uses an ensemble of rule-based and statistical machine learning (ML) natural language processing (NLP) techniques. Report correlation 154 may rely upon date correlation, by identifying date/times of the radiology and pathology exams for patients. These may be extracted from the contents of the respective reports 150, 152, or may be located independently from a database of the respective pathology and radiology information systems 24, 26. The IDRx 62 suitably extracts elements in a radiology report, e.g. as described with reference to FIG. 2. Example of vocabulary category values that can be grouped per radiologic finding include, by way of non-limiting illustrative example: the anatomical region associated with a radiologic finding; measurement(s) associated with the radiologic finding; a suspicion score associated with the radiologic finding; a biopsy recommendation associated with the radiologic finding; and/or so forth. Examines of vocabulary category values that can be extracted from the pathology report include, by way of non-limiting illustrative example: biopsied anatomy; biopsy procedure; final diagnosis; cancer stage; cancer grade; and/or TNM Classification of Malignant Tumours (TNM) value. The pathology report sentiment analysis 156 implements a method for classifying a pathology report outcome as non-diagnostic, benign, suspicious or malignant. The radiology and pathology concordance (or discordance) is determined by leveraging the vocabulary categories of the respective reports 150, 152 in the common integrated diagnostics representation data and syntactic and semantic linguistic information extracted by the data extraction module. A concordance score between a patient's radiology report 150 and pathology report 152 is computed; where a high concordance score indicates a strong contextual match between the radiology and pathology elements. The UI 160 provides for displaying or visualizing a matched pair of radiology and pathology reports 150, 152 based on concordance scores. The clinician can verify the matched pairs and confirm the result in the dashboard or equivalent visualization interface. The report tracking task 72 thus implements a system for classifying a pathology report as concordant or discordant with the radiology reports); 
executing, by the fine-grained finding descriptor generation computing tool, for each core finding instance in the extracted set of core finding instances, automated computer natural language processing operations comprising: 
generating a parse tree data structure for a corresponding portion of the medical report natural language content corresponding to the core finding instance (Klassen [0031] The textual content of the sections of the report are then analyzed by natural language processing (NLP) 98 which may, for example, employ a sequence of a tokenizer that identifies white space to break text into individual tokens (typically words or numbers), a grammatical parser that parses the tokens into grammatical units (e.g. sentences) and parts-of-speech (e.g. nouns, verbs, adjectives, and so forth), a phrase structure parser, dependency parser, named entity recognizer, semantic role labeler, and/or other NLP token grouping/classification. In one approach, an ensemble of rule-based and ML components are trained based on labeled and unlabeled clinical text collected from past (i.e. historical) radiology reports and curated to create an overall model of clinical report text. Elements of the natural language processing may include words, sentences, part-of-speech labels, and other basic linguistic information, as well as higher level semantic structures that identify units of text as values of vocabulary categories such as findings, diagnosis, follow-up recommendations, and other elements of discourse found in clinical text. The parsed text is then searched to identify vocabulary category values of a vocabulary of categories 64. The vocabulary of categories is a closed set of categories. These are categories of medical language commonly used in medical reports, such as (by way of non-limiting illustrative example): “finding”, “critical finding”, “recommendation”, “biopsy sample”, “reason for exam”, “diagnosis”, “impression”, and “observation”, and/or other vocabulary categories. A vocabulary category is instantiated in a particular medical report by a value for that category, e.g. a possible value for the “diagnosis” vocabulary category could be “prostate cancer”. The vocabulary could be hierarchical whereby all values belonging to a sub-category necessarily also belong to a higher-up category, e.g. all values that belong to the (sub-)category “critical diagnosis” also belong to the higher-up category “diagnosis”. Likewise, a value belonging to the (sub-)category “follow-up recommendation” necessarily also belongs to the higher-up category “recommendation”. There may also be some overlap between categories of the vocabulary, e.g. terms that fall within the “impression” category may also fall into the “finding” category. Alternatively, the vocabulary of categories may be designed to be mutually exclusive so that there is no such overlap); 
automatically executing phrasal grouping computer operations on the parse tree data structure to thereby associate one or more modifiers of core findings specified in the portion of the medical report natural language content with the core finding instance (Klassen [0043] In one suitable embodiment, the data extraction uses an ensemble of rule-based and statistical machine learning (ML) natural language processing (NLP) techniques. Report correlation 154 may rely upon date correlation, by identifying date/times of the radiology and pathology exams for patients. These may be extracted from the contents of the respective reports 150, 152, or may be located independently from a database of the respective pathology and radiology information systems 24, 26. The IDRx 62 suitably extracts elements in a radiology report, e.g. as described with reference to FIG. 2. Example of vocabulary category values that can be grouped per radiologic finding include, by way of non-limiting illustrative example: the anatomical region associated with a radiologic finding; measurement(s) associated with the radiologic finding; a suspicion score associated with the radiologic finding; a biopsy recommendation associated with the radiologic finding; and/or so forth. Examines of vocabulary category values that can be extracted from the pathology report include, by way of non-limiting illustrative example: biopsied anatomy; biopsy procedure; final diagnosis; cancer stage; cancer grade; and/or TNM Classification of Malignant Tumours (TNM) value. The pathology report sentiment analysis 156 implements a method for classifying a pathology report outcome as non-diagnostic, benign, suspicious or malignant. The radiology and pathology concordance (or discordance) is determined by leveraging the vocabulary categories of the respective reports 150, 152 in the common integrated diagnostics representation data and syntactic and semantic linguistic information extracted by the data extraction module. A concordance score between a patient's radiology report 150 and pathology report 152 is computed; where a high concordance score indicates a strong contextual match between the radiology and pathology elements. The UI 160 provides for displaying or visualizing a matched pair of radiology and pathology reports 150, 152 based on concordance scores. The clinician can verify the matched pairs and confirm the result in the dashboard or equivalent visualization interface. The report tracking task 72 thus implements a system for classifying a pathology report as concordant or discordant with the radiology reports), 
wherein the one or more modifiers are terms further defining a characteristic of the core finding (Klassen [0030] With reference to FIG. 2, an illustrative embodiment of the medical report transform (IDRx) 62 is shown. A medical report 90 is received by the medical report transform 62. The medical report 90 may be retrieved from a medical information system 92, e.g. from one of the medical information systems 24, 26, 28, having been invoked by one of the tasks 70, 72, 74, 76, 78, 80 to transform the medical report 90. Alternatively, the medical report 90 may be medical report under draft by a user operating a UI 94 of one of the medical information systems 24, 26, 28, for example a medical imaging examination report under draft by a radiologist operating the radiology UI 30 via the radiology workstation 34, or a pathology report under draft by a pathology lab technician operating the pathology UI 32 via a pathology workstation (not shown in FIG. 1), or so forth. The illustrative medical report transform 62 processes the medical report 90 by report segmentation 96 that segments the report into identifiable sections based on header text and/or based on an a priori known report schema identifying the report sections for the report format employed by the sourcing medical information system. For example, a medical imaging examination report typically has standard sections defined by the RIS 26. One non-limiting section schema for a medical imaging examination report may include a “Description” section setting forth the radiologist's findings (e.g., identifying a tumor); an “Impressions” section setting forth the radiologist's impressions (e.g., qualitative and/or quantitative characteristics of the tumor such as size and/or geometry); a “Diagnosis” section setting forth any diagnoses determined by the radiologist (e.g., a Breast Imaging-Reporting and Data System (BI-RADS) score for a tumor identified in a mammogram); and a “Recommendations” section setting forth any follow-up recommendations the radiologist may provide (e.g., a recommendation to perform a biopsy on a suspicious tumor identified in the report). This is merely an illustrative section schema, and additional or fewer and/or other sections may be employed; additionally or alternatively, the report segmentation process may use the textual content of report headers and/or report content to perform the segmentation into sections); 
and generating, by the fine-grained finding descriptor generation computing tool, a fine-grained finding descriptor data structure for the core finding instance based on the association of one or more modifiers of the core finding with the core finding instance (Klassen [0030] With reference to FIG. 2, an illustrative embodiment of the medical report transform (IDRx) 62 is shown. A medical report 90 is received by the medical report transform 62. The medical report 90 may be retrieved from a medical information system 92, e.g. from one of the medical information systems 24, 26, 28, having been invoked by one of the tasks 70, 72, 74, 76, 78, 80 to transform the medical report 90. Alternatively, the medical report 90 may be medical report under draft by a user operating a UI 94 of one of the medical information systems 24, 26, 28, for example a medical imaging examination report under draft by a radiologist operating the radiology UI 30 via the radiology workstation 34, or a pathology report under draft by a pathology lab technician operating the pathology UI 32 via a pathology workstation (not shown in FIG. 1), or so forth. The illustrative medical report transform 62 processes the medical report 90 by report segmentation 96 that segments the report into identifiable sections based on header text and/or based on an a priori known report schema identifying the report sections for the report format employed by the sourcing medical information system. For example, a medical imaging examination report typically has standard sections defined by the RIS 26. One non-limiting section schema for a medical imaging examination report may include a “Description” section setting forth the radiologist's findings (e.g., identifying a tumor); an “Impressions” section setting forth the radiologist's impressions (e.g., qualitative and/or quantitative characteristics of the tumor such as size and/or geometry); a “Diagnosis” section setting forth any diagnoses determined by the radiologist (e.g., a Breast Imaging-Reporting and Data System (BI-RADS) score for a tumor identified in a mammogram); and a “Recommendations” section setting forth any follow-up recommendations the radiologist may provide (e.g., a recommendation to perform a biopsy on a suspicious tumor identified in the report). This is merely an illustrative section schema, and additional or fewer and/or other sections may be employed; additionally or alternatively, the report segmentation process may use the textual content of report headers and/or report content to perform the segmentation into sections); 
and storing the fine-grained finding descriptor data structure in a fine-grained finding descriptor database for downstream computing system operations (Klassen [0006] In some non-limiting illustrative embodiments disclosed herein, a medical information technology (IT) system comprises one or more computers and one or more data storage media. The one or more computers and the one or more data storage media are interconnected by an electronic network, and the one or more data storage media store instructions executable by the one or more computers to define a plurality of medical information systems storing medical reports in different respective medical information system-specific medical report formats, and an integrated diagnostics system. For example, the plurality of medical information systems may include a Pathology Information System (PIS) storing pathology reports in a pathology report format and/or a Radiology Information System (RIS) storing medical imaging examination reports in a medical imaging examination report format that is different from the pathology report format. The integrated diagnostics system includes: a medical report transform operative to transform text of medical reports stored in the different respective system report formats to an integrated diagnostics representation which represents the text of the medical reports as vocabulary category values of a vocabulary of categories; and a plurality of document processing tasks each operative to invoke the medical report transform to transform one or more medical reports processed by the task to the integrated diagnostics representation and to perform the task on the vocabulary category values of the integrated diagnostics representation of the one or more medical reports processed by the task; [0032] By way of non-limiting illustration, FIG. 2 diagrammatically shows a possible example of the medical report 90 transformed into the integrated diagnostics representation 100, with report sections labeled by the headings “DESCRIPTION”, “IMPRESSIONS”, “RECOMMEND(ations)”, and “ALERTS”, and the content of each section represented as (vocabulary category:value) pairs, e.g. “finding:value”, “observ(ation):value”, et cetera).

Regarding claim 17, Klassen teaches the method of claim 1.
Klassen teaches
wherein executing, by the fine-grained finding descriptor generation computing tool, for each core finding instance in the extracted set of core finding instances, automated computer natural language processing operations further comprises 
identifying a subset of relevant sections of the natural language content of the at least one medical imaging report data structure where core findings are likely to be found (Klassen [0030] With reference to FIG. 2, an illustrative embodiment of the medical report transform (IDRx) 62 is shown. A medical report 90 is received by the medical report transform 62. The medical report 90 may be retrieved from a medical information system 92, e.g. from one of the medical information systems 24, 26, 28, having been invoked by one of the tasks 70, 72, 74, 76, 78, 80 to transform the medical report 90. Alternatively, the medical report 90 may be medical report under draft by a user operating a UI 94 of one of the medical information systems 24, 26, 28, for example a medical imaging examination report under draft by a radiologist operating the radiology UI 30 via the radiology workstation 34, or a pathology report under draft by a pathology lab technician operating the pathology UI 32 via a pathology workstation (not shown in FIG. 1), or so forth. The illustrative medical report transform 62 processes the medical report 90 by report segmentation 96 that segments the report into identifiable sections based on header text and/or based on an a priori known report schema identifying the report sections for the report format employed by the sourcing medical information system. For example, a medical imaging examination report typically has standard sections defined by the RIS 26. One non-limiting section schema for a medical imaging examination report may include a “Description” section setting forth the radiologist's findings (e.g., identifying a tumor); an “Impressions” section setting forth the radiologist's impressions (e.g., qualitative and/or quantitative characteristics of the tumor such as size and/or geometry); a “Diagnosis” section setting forth any diagnoses determined by the radiologist (e.g., a Breast Imaging-Reporting and Data System (BI-RADS) score for a tumor identified in a mammogram); and a “Recommendations” section setting forth any follow-up recommendations the radiologist may provide (e.g., a recommendation to perform a biopsy on a suspicious tumor identified in the report). This is merely an illustrative section schema, and additional or fewer and/or other sections may be employed; additionally or alternatively, the report segmentation process may use the textual content of report headers and/or report content to perform the segmentation into sections), 
and wherein the automated computer natural language processing operations are performed on the portions of the medical report natural language content associated with the subset of relevant sections (Klassen [0031] The textual content of the sections of the report are then analyzed by natural language processing (NLP) 98 which may, for example, employ a sequence of a tokenizer that identifies white space to break text into individual tokens (typically words or numbers), a grammatical parser that parses the tokens into grammatical units (e.g. sentences) and parts-of-speech (e.g. nouns, verbs, adjectives, and so forth), a phrase structure parser, dependency parser, named entity recognizer, semantic role labeler, and/or other NLP token grouping/classification. In one approach, an ensemble of rule-based and ML components are trained based on labeled and unlabeled clinical text collected from past (i.e. historical) radiology reports and curated to create an overall model of clinical report text. Elements of the natural language processing may include words, sentences, part-of-speech labels, and other basic linguistic information, as well as higher level semantic structures that identify units of text as values of vocabulary categories such as findings, diagnosis, follow-up recommendations, and other elements of discourse found in clinical text. The parsed text is then searched to identify vocabulary category values of a vocabulary of categories 64. The vocabulary of categories is a closed set of categories. These are categories of medical language commonly used in medical reports, such as (by way of non-limiting illustrative example): “finding”, “critical finding”, “recommendation”, “biopsy sample”, “reason for exam”, “diagnosis”, “impression”, and “observation”, and/or other vocabulary categories. A vocabulary category is instantiated in a particular medical report by a value for that category, e.g. a possible value for the “diagnosis” vocabulary category could be “prostate cancer”. The vocabulary could be hierarchical whereby all values belonging to a sub-category necessarily also belong to a higher-up category, e.g. all values that belong to the (sub-)category “critical diagnosis” also belong to the higher-up category “diagnosis”. Likewise, a value belonging to the (sub-)category “follow-up recommendation” necessarily also belongs to the higher-up category “recommendation”. There may also be some overlap between categories of the vocabulary, e.g. terms that fall within the “impression” category may also fall into the “finding” category. Alternatively, the vocabulary of categories may be designed to be mutually exclusive so that there is no such overlap).

Regarding claim 18, Klassen teaches the method of claim 17.
Klassen teaches
wherein the relevant sections are an indications section and a findings section of the at least one medical imaging report data structure (Klassen [0030] With reference to FIG. 2, an illustrative embodiment of the medical report transform (IDRx) 62 is shown. A medical report 90 is received by the medical report transform 62. The medical report 90 may be retrieved from a medical information system 92, e.g. from one of the medical information systems 24, 26, 28, having been invoked by one of the tasks 70, 72, 74, 76, 78, 80 to transform the medical report 90. Alternatively, the medical report 90 may be medical report under draft by a user operating a UI 94 of one of the medical information systems 24, 26, 28, for example a medical imaging examination report under draft by a radiologist operating the radiology UI 30 via the radiology workstation 34, or a pathology report under draft by a pathology lab technician operating the pathology UI 32 via a pathology workstation (not shown in FIG. 1), or so forth. The illustrative medical report transform 62 processes the medical report 90 by report segmentation 96 that segments the report into identifiable sections based on header text and/or based on an a priori known report schema identifying the report sections for the report format employed by the sourcing medical information system. For example, a medical imaging examination report typically has standard sections defined by the RIS 26. One non-limiting section schema for a medical imaging examination report may include a “Description” section setting forth the radiologist's findings (e.g., identifying a tumor); an “Impressions” section setting forth the radiologist's impressions (e.g., qualitative and/or quantitative characteristics of the tumor such as size and/or geometry); a “Diagnosis” section setting forth any diagnoses determined by the radiologist (e.g., a Breast Imaging-Reporting and Data System (BI-RADS) score for a tumor identified in a mammogram); and a “Recommendations” section setting forth any follow-up recommendations the radiologist may provide (e.g., a recommendation to perform a biopsy on a suspicious tumor identified in the report). This is merely an illustrative section schema, and additional or fewer and/or other sections may be employed; additionally or alternatively, the report segmentation process may use the textual content of report headers and/or report content to perform the segmentation into sections).

Regarding claim 19, Klassen teaches 
a computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a computing device, causes the computing device to implement a fine-grained finding descriptor generation computing tool that automatically generates fine-grained labels for downstream computer system operations (Klassen [0008] In some non-limiting illustrative embodiments disclosed herein, a method is performed in conjunction with a Pathology Information System (PIS) which stores pathology reports in a pathology report format, and a Radiology Information System (RIS) which stores medical imaging examination reports in a medical imaging examination report format that is different from the pathology report format. The method comprises, using an electronic processor programmed by instructions stored on a non-transitory storage medium: converting at least one pathology report and at least one medical imaging examination report to an integrated diagnostics representation which represents the text of the converted reports as vocabulary category values of a vocabulary of categories; temporally ordering the converted reports based on timestamps of the respective reports; identifying a responsive report and a causational report based on vocabulary category values of the converted responsive report being responsive to vocabulary category values of the converted causational report; and displaying, on a workstation, a summary of the vocabulary category values used in the identifying)
at least by: 
processing medical report natural language content of at least one medical imaging report data structure associated with at least one medical image (Klassen [0040] The report tracking task 72 provides a correlation and concordance system between radiology and pathology (or more generally, between reports on two different medical information systems), and leverages both rule-based and statistical machine learning (ML) natural language processing (NLP) components such as the illustrative medical report transform 62 to correlate radiology and pathology reports and evaluate if their sentiment is concordant. The vocabulary of categories 64 enables understanding of important information in the radiology reports, including observations of the physical state of the patient based on radiographic imaging, descriptions of possible diagnosis or interpretations of observations made by the radiologist, and follow-up recommendations for subsequent tests or exams. Properties of these elements are discerned, such as anatomical regions, measurements of observed phenomena, and the vocabulary describing disease or cancer using NLP techniques. The extracted properties correlate to findings, observations, and diagnosis of biopsy procedures described in one or more subsequent and related pathology reports), 
based on a core finding lexicon data structure (Klassen [0040] The report tracking task 72 provides a correlation and concordance system between radiology and pathology (or more generally, between reports on two different medical information systems), and leverages both rule-based and statistical machine learning (ML) natural language processing (NLP) components such as the illustrative medical report transform 62 to correlate radiology and pathology reports and evaluate if their sentiment is concordant. The vocabulary of categories 64 enables understanding of important information in the radiology reports, including observations of the physical state of the patient based on radiographic imaging, descriptions of possible diagnosis or interpretations of observations made by the radiologist, and follow-up recommendations for subsequent tests or exams. Properties of these elements are discerned, such as anatomical regions, measurements of observed phenomena, and the vocabulary describing disease or cancer using NLP techniques. The extracted properties correlate to findings, observations, and diagnosis of biopsy procedures described in one or more subsequent and related pathology reports),
to extract a set of core finding instances of one or more core findings in the core finding lexicon data structure (Klassen [0021] In systems and methods disclosed herein, the existing medical IT infrastructure paradigm of employing separate systems for different medical areas (e.g. PACS, RIS, PIS, CVIS, EHR . . . ) is retained; but these systems are augmented by an integrated diagnostics system that provides automated and controlled cross-fertilization of data between the different systems. This is achieved by the use of an integrated diagnostics representation (IDR) which represents medical reports and other text-based medical data using a standardized vocabulary of categories. Thus, for example, a radiologist continues to produce radiology reports in the RIS environment using a radiology-specific reporting format, the pathologist continues to product pathology reports in the PIS environment using a pathology-specific (or even finer-grained pathology lab-specific) reporting format, and so forth; but, an integrated diagnostics representation extractor (IDRx) converts each of these documents to a representation in which key concepts (represented by categories of the vocabulary of categories) are extracted out to form IDR representations of the documents. In this way, for example, a recommendation to perform a pathology test contained in a radiology report is easily linked with a corresponding pathology report that summarizes the results of the recommended pathology test),
from the medical report natural language content (Klassen [0021] In systems and methods disclosed herein, the existing medical IT infrastructure paradigm of employing separate systems for different medical areas (e.g. PACS, RIS, PIS, CVIS, EHR . . . ) is retained; but these systems are augmented by an integrated diagnostics system that provides automated and controlled cross-fertilization of data between the different systems. This is achieved by the use of an integrated diagnostics representation (IDR) which represents medical reports and other text-based medical data using a standardized vocabulary of categories. Thus, for example, a radiologist continues to produce radiology reports in the RIS environment using a radiology-specific reporting format, the pathologist continues to product pathology reports in the PIS environment using a pathology-specific (or even finer-grained pathology lab-specific) reporting format, and so forth; but, an integrated diagnostics representation extractor (IDRx) converts each of these documents to a representation in which key concepts (represented by categories of the vocabulary of categories) are extracted out to form IDR representations of the documents. In this way, for example, a recommendation to perform a pathology test contained in a radiology report is easily linked with a corresponding pathology report that summarizes the results of the recommended pathology test), 
wherein the one or more core findings are terms describing one of anatomical structures or abnormalities present in the at least one medical image (Klassen [0043] In one suitable embodiment, the data extraction uses an ensemble of rule-based and statistical machine learning (ML) natural language processing (NLP) techniques. Report correlation 154 may rely upon date correlation, by identifying date/times of the radiology and pathology exams for patients. These may be extracted from the contents of the respective reports 150, 152, or may be located independently from a database of the respective pathology and radiology information systems 24, 26. The IDRx 62 suitably extracts elements in a radiology report, e.g. as described with reference to FIG. 2. Example of vocabulary category values that can be grouped per radiologic finding include, by way of non-limiting illustrative example: the anatomical region associated with a radiologic finding; measurement(s) associated with the radiologic finding; a suspicion score associated with the radiologic finding; a biopsy recommendation associated with the radiologic finding; and/or so forth. Examines of vocabulary category values that can be extracted from the pathology report include, by way of non-limiting illustrative example: biopsied anatomy; biopsy procedure; final diagnosis; cancer stage; cancer grade; and/or TNM Classification of Malignant Tumours (TNM) value. The pathology report sentiment analysis 156 implements a method for classifying a pathology report outcome as non-diagnostic, benign, suspicious or malignant. The radiology and pathology concordance (or discordance) is determined by leveraging the vocabulary categories of the respective reports 150, 152 in the common integrated diagnostics representation data and syntactic and semantic linguistic information extracted by the data extraction module. A concordance score between a patient's radiology report 150 and pathology report 152 is computed; where a high concordance score indicates a strong contextual match between the radiology and pathology elements. The UI 160 provides for displaying or visualizing a matched pair of radiology and pathology reports 150, 152 based on concordance scores. The clinician can verify the matched pairs and confirm the result in the dashboard or equivalent visualization interface. The report tracking task 72 thus implements a system for classifying a pathology report as concordant or discordant with the radiology reports); 
executing, for each core finding instance in the extracted set of core finding instances, automated computer natural language processing operations comprising: 
generating a parse tree data structure for a corresponding portion of the medical report natural language content corresponding to the core finding instance (Klassen [0031] The textual content of the sections of the report are then analyzed by natural language processing (NLP) 98 which may, for example, employ a sequence of a tokenizer that identifies white space to break text into individual tokens (typically words or numbers), a grammatical parser that parses the tokens into grammatical units (e.g. sentences) and parts-of-speech (e.g. nouns, verbs, adjectives, and so forth), a phrase structure parser, dependency parser, named entity recognizer, semantic role labeler, and/or other NLP token grouping/classification. In one approach, an ensemble of rule-based and ML components are trained based on labeled and unlabeled clinical text collected from past (i.e. historical) radiology reports and curated to create an overall model of clinical report text. Elements of the natural language processing may include words, sentences, part-of-speech labels, and other basic linguistic information, as well as higher level semantic structures that identify units of text as values of vocabulary categories such as findings, diagnosis, follow-up recommendations, and other elements of discourse found in clinical text. The parsed text is then searched to identify vocabulary category values of a vocabulary of categories 64. The vocabulary of categories is a closed set of categories. These are categories of medical language commonly used in medical reports, such as (by way of non-limiting illustrative example): “finding”, “critical finding”, “recommendation”, “biopsy sample”, “reason for exam”, “diagnosis”, “impression”, and “observation”, and/or other vocabulary categories. A vocabulary category is instantiated in a particular medical report by a value for that category, e.g. a possible value for the “diagnosis” vocabulary category could be “prostate cancer”. The vocabulary could be hierarchical whereby all values belonging to a sub-category necessarily also belong to a higher-up category, e.g. all values that belong to the (sub-)category “critical diagnosis” also belong to the higher-up category “diagnosis”. Likewise, a value belonging to the (sub-)category “follow-up recommendation” necessarily also belongs to the higher-up category “recommendation”. There may also be some overlap between categories of the vocabulary, e.g. terms that fall within the “impression” category may also fall into the “finding” category. Alternatively, the vocabulary of categories may be designed to be mutually exclusive so that there is no such overlap); 
automatically executing phrasal grouping computer operations on the parse tree data structure to thereby associate one or more modifiers of core findings specified in the portion of the medical report natural language content with the core finding instance (Klassen [0043] In one suitable embodiment, the data extraction uses an ensemble of rule-based and statistical machine learning (ML) natural language processing (NLP) techniques. Report correlation 154 may rely upon date correlation, by identifying date/times of the radiology and pathology exams for patients. These may be extracted from the contents of the respective reports 150, 152, or may be located independently from a database of the respective pathology and radiology information systems 24, 26. The IDRx 62 suitably extracts elements in a radiology report, e.g. as described with reference to FIG. 2. Example of vocabulary category values that can be grouped per radiologic finding include, by way of non-limiting illustrative example: the anatomical region associated with a radiologic finding; measurement(s) associated with the radiologic finding; a suspicion score associated with the radiologic finding; a biopsy recommendation associated with the radiologic finding; and/or so forth. Examines of vocabulary category values that can be extracted from the pathology report include, by way of non-limiting illustrative example: biopsied anatomy; biopsy procedure; final diagnosis; cancer stage; cancer grade; and/or TNM Classification of Malignant Tumours (TNM) value. The pathology report sentiment analysis 156 implements a method for classifying a pathology report outcome as non-diagnostic, benign, suspicious or malignant. The radiology and pathology concordance (or discordance) is determined by leveraging the vocabulary categories of the respective reports 150, 152 in the common integrated diagnostics representation data and syntactic and semantic linguistic information extracted by the data extraction module. A concordance score between a patient's radiology report 150 and pathology report 152 is computed; where a high concordance score indicates a strong contextual match between the radiology and pathology elements. The UI 160 provides for displaying or visualizing a matched pair of radiology and pathology reports 150, 152 based on concordance scores. The clinician can verify the matched pairs and confirm the result in the dashboard or equivalent visualization interface. The report tracking task 72 thus implements a system for classifying a pathology report as concordant or discordant with the radiology reports), 
wherein the one or more modifiers are terms further defining a characteristic of the core finding (Klassen [0030] With reference to FIG. 2, an illustrative embodiment of the medical report transform (IDRx) 62 is shown. A medical report 90 is received by the medical report transform 62. The medical report 90 may be retrieved from a medical information system 92, e.g. from one of the medical information systems 24, 26, 28, having been invoked by one of the tasks 70, 72, 74, 76, 78, 80 to transform the medical report 90. Alternatively, the medical report 90 may be medical report under draft by a user operating a UI 94 of one of the medical information systems 24, 26, 28, for example a medical imaging examination report under draft by a radiologist operating the radiology UI 30 via the radiology workstation 34, or a pathology report under draft by a pathology lab technician operating the pathology UI 32 via a pathology workstation (not shown in FIG. 1), or so forth. The illustrative medical report transform 62 processes the medical report 90 by report segmentation 96 that segments the report into identifiable sections based on header text and/or based on an a priori known report schema identifying the report sections for the report format employed by the sourcing medical information system. For example, a medical imaging examination report typically has standard sections defined by the RIS 26. One non-limiting section schema for a medical imaging examination report may include a “Description” section setting forth the radiologist's findings (e.g., identifying a tumor); an “Impressions” section setting forth the radiologist's impressions (e.g., qualitative and/or quantitative characteristics of the tumor such as size and/or geometry); a “Diagnosis” section setting forth any diagnoses determined by the radiologist (e.g., a Breast Imaging-Reporting and Data System (BI-RADS) score for a tumor identified in a mammogram); and a “Recommendations” section setting forth any follow-up recommendations the radiologist may provide (e.g., a recommendation to perform a biopsy on a suspicious tumor identified in the report). This is merely an illustrative section schema, and additional or fewer and/or other sections may be employed; additionally or alternatively, the report segmentation process may use the textual content of report headers and/or report content to perform the segmentation into sections); 
and generating, by the fine-grained finding descriptor generation computing tool, a fine-grained finding descriptor data structure for the core finding instance 
based on the association of one or more modifiers of the core finding with the core finding instance (Klassen [0030] With reference to FIG. 2, an illustrative embodiment of the medical report transform (IDRx) 62 is shown. A medical report 90 is received by the medical report transform 62. The medical report 90 may be retrieved from a medical information system 92, e.g. from one of the medical information systems 24, 26, 28, having been invoked by one of the tasks 70, 72, 74, 76, 78, 80 to transform the medical report 90. Alternatively, the medical report 90 may be medical report under draft by a user operating a UI 94 of one of the medical information systems 24, 26, 28, for example a medical imaging examination report under draft by a radiologist operating the radiology UI 30 via the radiology workstation 34, or a pathology report under draft by a pathology lab technician operating the pathology UI 32 via a pathology workstation (not shown in FIG. 1), or so forth. The illustrative medical report transform 62 processes the medical report 90 by report segmentation 96 that segments the report into identifiable sections based on header text and/or based on an a priori known report schema identifying the report sections for the report format employed by the sourcing medical information system. For example, a medical imaging examination report typically has standard sections defined by the RIS 26. One non-limiting section schema for a medical imaging examination report may include a “Description” section setting forth the radiologist's findings (e.g., identifying a tumor); an “Impressions” section setting forth the radiologist's impressions (e.g., qualitative and/or quantitative characteristics of the tumor such as size and/or geometry); a “Diagnosis” section setting forth any diagnoses determined by the radiologist (e.g., a Breast Imaging-Reporting and Data System (BI-RADS) score for a tumor identified in a mammogram); and a “Recommendations” section setting forth any follow-up recommendations the radiologist may provide (e.g., a recommendation to perform a biopsy on a suspicious tumor identified in the report). This is merely an illustrative section schema, and additional or fewer and/or other sections may be employed; additionally or alternatively, the report segmentation process may use the textual content of report headers and/or report content to perform the segmentation into sections); 
and storing the fine-grained finding descriptor data structure in a fine-grained finding descriptor database for use in downstream computer system operations (Klassen [0006] In some non-limiting illustrative embodiments disclosed herein, a medical information technology (IT) system comprises one or more computers and one or more data storage media. The one or more computers and the one or more data storage media are interconnected by an electronic network, and the one or more data storage media store instructions executable by the one or more computers to define a plurality of medical information systems storing medical reports in different respective medical information system-specific medical report formats, and an integrated diagnostics system. For example, the plurality of medical information systems may include a Pathology Information System (PIS) storing pathology reports in a pathology report format and/or a Radiology Information System (RIS) storing medical imaging examination reports in a medical imaging examination report format that is different from the pathology report format. The integrated diagnostics system includes: a medical report transform operative to transform text of medical reports stored in the different respective system report formats to an integrated diagnostics representation which represents the text of the medical reports as vocabulary category values of a vocabulary of categories; and a plurality of document processing tasks each operative to invoke the medical report transform to transform one or more medical reports processed by the task to the integrated diagnostics representation and to perform the task on the vocabulary category values of the integrated diagnostics representation of the one or more medical reports processed by the task; [0032] By way of non-limiting illustration, FIG. 2 diagrammatically shows a possible example of the medical report 90 transformed into the integrated diagnostics representation 100, with report sections labeled by the headings “DESCRIPTION”, “IMPRESSIONS”, “RECOMMEND(ations)”, and “ALERTS”, and the content of each section represented as (vocabulary category:value) pairs, e.g. “finding:value”, “observ(ation):value”, et cetera).

Regarding claim 20, Klassen teaches 
an apparatus comprising: a processor; and a memory coupled to the processor, wherein the memory comprises instructions which, when executed by the processor, cause the processor to implement a fine-grained finding descriptor generation computing tool that automatically generates fine-grained labels for downstream computer system operations (Klassen [0008] In some non-limiting illustrative embodiments disclosed herein, a method is performed in conjunction with a Pathology Information System (PIS) which stores pathology reports in a pathology report format, and a Radiology Information System (RIS) which stores medical imaging examination reports in a medical imaging examination report format that is different from the pathology report format. The method comprises, using an electronic processor programmed by instructions stored on a non-transitory storage medium: converting at least one pathology report and at least one medical imaging examination report to an integrated diagnostics representation which represents the text of the converted reports as vocabulary category values of a vocabulary of categories; temporally ordering the converted reports based on timestamps of the respective reports; identifying a responsive report and a causational report based on vocabulary category values of the converted responsive report being responsive to vocabulary category values of the converted causational report; and displaying, on a workstation, a summary of the vocabulary category values used in the identifying)
at least by: 
processing medical report natural language content of at least one medical imaging report data structure associated with at least one medical image (Klassen [0040] The report tracking task 72 provides a correlation and concordance system between radiology and pathology (or more generally, between reports on two different medical information systems), and leverages both rule-based and statistical machine learning (ML) natural language processing (NLP) components such as the illustrative medical report transform 62 to correlate radiology and pathology reports and evaluate if their sentiment is concordant. The vocabulary of categories 64 enables understanding of important information in the radiology reports, including observations of the physical state of the patient based on radiographic imaging, descriptions of possible diagnosis or interpretations of observations made by the radiologist, and follow-up recommendations for subsequent tests or exams. Properties of these elements are discerned, such as anatomical regions, measurements of observed phenomena, and the vocabulary describing disease or cancer using NLP techniques. The extracted properties correlate to findings, observations, and diagnosis of biopsy procedures described in one or more subsequent and related pathology reports), 
based on a core finding lexicon data structure (Klassen [0040] The report tracking task 72 provides a correlation and concordance system between radiology and pathology (or more generally, between reports on two different medical information systems), and leverages both rule-based and statistical machine learning (ML) natural language processing (NLP) components such as the illustrative medical report transform 62 to correlate radiology and pathology reports and evaluate if their sentiment is concordant. The vocabulary of categories 64 enables understanding of important information in the radiology reports, including observations of the physical state of the patient based on radiographic imaging, descriptions of possible diagnosis or interpretations of observations made by the radiologist, and follow-up recommendations for subsequent tests or exams. Properties of these elements are discerned, such as anatomical regions, measurements of observed phenomena, and the vocabulary describing disease or cancer using NLP techniques. The extracted properties correlate to findings, observations, and diagnosis of biopsy procedures described in one or more subsequent and related pathology reports), 
to extract a set of core finding instances of one or more core findings in the core finding lexicon data structure (Klassen [0021] In systems and methods disclosed herein, the existing medical IT infrastructure paradigm of employing separate systems for different medical areas (e.g. PACS, RIS, PIS, CVIS, EHR . . . ) is retained; but these systems are augmented by an integrated diagnostics system that provides automated and controlled cross-fertilization of data between the different systems. This is achieved by the use of an integrated diagnostics representation (IDR) which represents medical reports and other text-based medical data using a standardized vocabulary of categories. Thus, for example, a radiologist continues to produce radiology reports in the RIS environment using a radiology-specific reporting format, the pathologist continues to product pathology reports in the PIS environment using a pathology-specific (or even finer-grained pathology lab-specific) reporting format, and so forth; but, an integrated diagnostics representation extractor (IDRx) converts each of these documents to a representation in which key concepts (represented by categories of the vocabulary of categories) are extracted out to form IDR representations of the documents. In this way, for example, a recommendation to perform a pathology test contained in a radiology report is easily linked with a corresponding pathology report that summarizes the results of the recommended pathology test), 
from the medical report natural language content (Klassen [0021] In systems and methods disclosed herein, the existing medical IT infrastructure paradigm of employing separate systems for different medical areas (e.g. PACS, RIS, PIS, CVIS, EHR . . . ) is retained; but these systems are augmented by an integrated diagnostics system that provides automated and controlled cross-fertilization of data between the different systems. This is achieved by the use of an integrated diagnostics representation (IDR) which represents medical reports and other text-based medical data using a standardized vocabulary of categories. Thus, for example, a radiologist continues to produce radiology reports in the RIS environment using a radiology-specific reporting format, the pathologist continues to product pathology reports in the PIS environment using a pathology-specific (or even finer-grained pathology lab-specific) reporting format, and so forth; but, an integrated diagnostics representation extractor (IDRx) converts each of these documents to a representation in which key concepts (represented by categories of the vocabulary of categories) are extracted out to form IDR representations of the documents. In this way, for example, a recommendation to perform a pathology test contained in a radiology report is easily linked with a corresponding pathology report that summarizes the results of the recommended pathology test), 
wherein the one or more core findings are terms describing one of anatomical structures or abnormalities present in the at least one medical image (Klassen [0043] In one suitable embodiment, the data extraction uses an ensemble of rule-based and statistical machine learning (ML) natural language processing (NLP) techniques. Report correlation 154 may rely upon date correlation, by identifying date/times of the radiology and pathology exams for patients. These may be extracted from the contents of the respective reports 150, 152, or may be located independently from a database of the respective pathology and radiology information systems 24, 26. The IDRx 62 suitably extracts elements in a radiology report, e.g. as described with reference to FIG. 2. Example of vocabulary category values that can be grouped per radiologic finding include, by way of non-limiting illustrative example: the anatomical region associated with a radiologic finding; measurement(s) associated with the radiologic finding; a suspicion score associated with the radiologic finding; a biopsy recommendation associated with the radiologic finding; and/or so forth. Examines of vocabulary category values that can be extracted from the pathology report include, by way of non-limiting illustrative example: biopsied anatomy; biopsy procedure; final diagnosis; cancer stage; cancer grade; and/or TNM Classification of Malignant Tumours (TNM) value. The pathology report sentiment analysis 156 implements a method for classifying a pathology report outcome as non-diagnostic, benign, suspicious or malignant. The radiology and pathology concordance (or discordance) is determined by leveraging the vocabulary categories of the respective reports 150, 152 in the common integrated diagnostics representation data and syntactic and semantic linguistic information extracted by the data extraction module. A concordance score between a patient's radiology report 150 and pathology report 152 is computed; where a high concordance score indicates a strong contextual match between the radiology and pathology elements. The UI 160 provides for displaying or visualizing a matched pair of radiology and pathology reports 150, 152 based on concordance scores. The clinician can verify the matched pairs and confirm the result in the dashboard or equivalent visualization interface. The report tracking task 72 thus implements a system for classifying a pathology report as concordant or discordant with the radiology reports); 
executing, for each core finding instance in the extracted set of core finding instances, automated computer natural language processing operations comprising: 
generating a parse tree data structure for a corresponding portion of the medical report natural language content corresponding to the core finding instance (Klassen [0031] The textual content of the sections of the report are then analyzed by natural language processing (NLP) 98 which may, for example, employ a sequence of a tokenizer that identifies white space to break text into individual tokens (typically words or numbers), a grammatical parser that parses the tokens into grammatical units (e.g. sentences) and parts-of-speech (e.g. nouns, verbs, adjectives, and so forth), a phrase structure parser, dependency parser, named entity recognizer, semantic role labeler, and/or other NLP token grouping/classification. In one approach, an ensemble of rule-based and ML components are trained based on labeled and unlabeled clinical text collected from past (i.e. historical) radiology reports and curated to create an overall model of clinical report text. Elements of the natural language processing may include words, sentences, part-of-speech labels, and other basic linguistic information, as well as higher level semantic structures that identify units of text as values of vocabulary categories such as findings, diagnosis, follow-up recommendations, and other elements of discourse found in clinical text. The parsed text is then searched to identify vocabulary category values of a vocabulary of categories 64. The vocabulary of categories is a closed set of categories. These are categories of medical language commonly used in medical reports, such as (by way of non-limiting illustrative example): “finding”, “critical finding”, “recommendation”, “biopsy sample”, “reason for exam”, “diagnosis”, “impression”, and “observation”, and/or other vocabulary categories. A vocabulary category is instantiated in a particular medical report by a value for that category, e.g. a possible value for the “diagnosis” vocabulary category could be “prostate cancer”. The vocabulary could be hierarchical whereby all values belonging to a sub-category necessarily also belong to a higher-up category, e.g. all values that belong to the (sub-)category “critical diagnosis” also belong to the higher-up category “diagnosis”. Likewise, a value belonging to the (sub-)category “follow-up recommendation” necessarily also belongs to the higher-up category “recommendation”. There may also be some overlap between categories of the vocabulary, e.g. terms that fall within the “impression” category may also fall into the “finding” category. Alternatively, the vocabulary of categories may be designed to be mutually exclusive so that there is no such overlap); 
automatically executing phrasal grouping computer operations on the parse tree data structure to thereby associate one or more modifiers of core findings specified in the portion of the medical report natural language content with the core finding instance (Klassen [0043] In one suitable embodiment, the data extraction uses an ensemble of rule-based and statistical machine learning (ML) natural language processing (NLP) techniques. Report correlation 154 may rely upon date correlation, by identifying date/times of the radiology and pathology exams for patients. These may be extracted from the contents of the respective reports 150, 152, or may be located independently from a database of the respective pathology and radiology information systems 24, 26. The IDRx 62 suitably extracts elements in a radiology report, e.g. as described with reference to FIG. 2. Example of vocabulary category values that can be grouped per radiologic finding include, by way of non-limiting illustrative example: the anatomical region associated with a radiologic finding; measurement(s) associated with the radiologic finding; a suspicion score associated with the radiologic finding; a biopsy recommendation associated with the radiologic finding; and/or so forth. Examines of vocabulary category values that can be extracted from the pathology report include, by way of non-limiting illustrative example: biopsied anatomy; biopsy procedure; final diagnosis; cancer stage; cancer grade; and/or TNM Classification of Malignant Tumours (TNM) value. The pathology report sentiment analysis 156 implements a method for classifying a pathology report outcome as non-diagnostic, benign, suspicious or malignant. The radiology and pathology concordance (or discordance) is determined by leveraging the vocabulary categories of the respective reports 150, 152 in the common integrated diagnostics representation data and syntactic and semantic linguistic information extracted by the data extraction module. A concordance score between a patient's radiology report 150 and pathology report 152 is computed; where a high concordance score indicates a strong contextual match between the radiology and pathology elements. The UI 160 provides for displaying or visualizing a matched pair of radiology and pathology reports 150, 152 based on concordance scores. The clinician can verify the matched pairs and confirm the result in the dashboard or equivalent visualization interface. The report tracking task 72 thus implements a system for classifying a pathology report as concordant or discordant with the radiology reports), 
wherein the one or more modifiers are terms further defining a characteristic of the core finding (Klassen [0030] With reference to FIG. 2, an illustrative embodiment of the medical report transform (IDRx) 62 is shown. A medical report 90 is received by the medical report transform 62. The medical report 90 may be retrieved from a medical information system 92, e.g. from one of the medical information systems 24, 26, 28, having been invoked by one of the tasks 70, 72, 74, 76, 78, 80 to transform the medical report 90. Alternatively, the medical report 90 may be medical report under draft by a user operating a UI 94 of one of the medical information systems 24, 26, 28, for example a medical imaging examination report under draft by a radiologist operating the radiology UI 30 via the radiology workstation 34, or a pathology report under draft by a pathology lab technician operating the pathology UI 32 via a pathology workstation (not shown in FIG. 1), or so forth. The illustrative medical report transform 62 processes the medical report 90 by report segmentation 96 that segments the report into identifiable sections based on header text and/or based on an a priori known report schema identifying the report sections for the report format employed by the sourcing medical information system. For example, a medical imaging examination report typically has standard sections defined by the RIS 26. One non-limiting section schema for a medical imaging examination report may include a “Description” section setting forth the radiologist's findings (e.g., identifying a tumor); an “Impressions” section setting forth the radiologist's impressions (e.g., qualitative and/or quantitative characteristics of the tumor such as size and/or geometry); a “Diagnosis” section setting forth any diagnoses determined by the radiologist (e.g., a Breast Imaging-Reporting and Data System (BI-RADS) score for a tumor identified in a mammogram); and a “Recommendations” section setting forth any follow-up recommendations the radiologist may provide (e.g., a recommendation to perform a biopsy on a suspicious tumor identified in the report). This is merely an illustrative section schema, and additional or fewer and/or other sections may be employed; additionally or alternatively, the report segmentation process may use the textual content of report headers and/or report content to perform the segmentation into sections); 
and generating, by the fine-grained finding descriptor generation computing tool, a fine-grained finding descriptor data structure for the core finding instance 
based on the association of one or more modifiers of the core finding with the core finding instance (Klassen [0030] With reference to FIG. 2, an illustrative embodiment of the medical report transform (IDRx) 62 is shown. A medical report 90 is received by the medical report transform 62. The medical report 90 may be retrieved from a medical information system 92, e.g. from one of the medical information systems 24, 26, 28, having been invoked by one of the tasks 70, 72, 74, 76, 78, 80 to transform the medical report 90. Alternatively, the medical report 90 may be medical report under draft by a user operating a UI 94 of one of the medical information systems 24, 26, 28, for example a medical imaging examination report under draft by a radiologist operating the radiology UI 30 via the radiology workstation 34, or a pathology report under draft by a pathology lab technician operating the pathology UI 32 via a pathology workstation (not shown in FIG. 1), or so forth. The illustrative medical report transform 62 processes the medical report 90 by report segmentation 96 that segments the report into identifiable sections based on header text and/or based on an a priori known report schema identifying the report sections for the report format employed by the sourcing medical information system. For example, a medical imaging examination report typically has standard sections defined by the RIS 26. One non-limiting section schema for a medical imaging examination report may include a “Description” section setting forth the radiologist's findings (e.g., identifying a tumor); an “Impressions” section setting forth the radiologist's impressions (e.g., qualitative and/or quantitative characteristics of the tumor such as size and/or geometry); a “Diagnosis” section setting forth any diagnoses determined by the radiologist (e.g., a Breast Imaging-Reporting and Data System (BI-RADS) score for a tumor identified in a mammogram); and a “Recommendations” section setting forth any follow-up recommendations the radiologist may provide (e.g., a recommendation to perform a biopsy on a suspicious tumor identified in the report). This is merely an illustrative section schema, and additional or fewer and/or other sections may be employed; additionally or alternatively, the report segmentation process may use the textual content of report headers and/or report content to perform the segmentation into sections); 
and storing the fine-grained finding descriptor data structure in a fine-grained finding descriptor database for use in downstream computer system operations (Klassen [0006] In some non-limiting illustrative embodiments disclosed herein, a medical information technology (IT) system comprises one or more computers and one or more data storage media. The one or more computers and the one or more data storage media are interconnected by an electronic network, and the one or more data storage media store instructions executable by the one or more computers to define a plurality of medical information systems storing medical reports in different respective medical information system-specific medical report formats, and an integrated diagnostics system. For example, the plurality of medical information systems may include a Pathology Information System (PIS) storing pathology reports in a pathology report format and/or a Radiology Information System (RIS) storing medical imaging examination reports in a medical imaging examination report format that is different from the pathology report format. The integrated diagnostics system includes: a medical report transform operative to transform text of medical reports stored in the different respective system report formats to an integrated diagnostics representation which represents the text of the medical reports as vocabulary category values of a vocabulary of categories; and a plurality of document processing tasks each operative to invoke the medical report transform to transform one or more medical reports processed by the task to the integrated diagnostics representation and to perform the task on the vocabulary category values of the integrated diagnostics representation of the one or more medical reports processed by the task; [0032] By way of non-limiting illustration, FIG. 2 diagrammatically shows a possible example of the medical report 90 transformed into the integrated diagnostics representation 100, with report sections labeled by the headings “DESCRIPTION”, “IMPRESSIONS”, “RECOMMEND(ations)”, and “ALERTS”, and the content of each section represented as (vocabulary category:value) pairs, e.g. “finding:value”, “observ(ation):value”, et cetera).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 2-3 and 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over Klassen in view of Rajan et al. (US Patent Pub. No. 2018/0107792), hereinafter Rajan.
Regarding claim 2, Klassen teaches the method of claim 1.
Klassen teaches
a finding type indicating a type of the core finding Klassen ([0073] In another (not mutually exclusive) application, with the above classifier may be used in a retrospective batch mode to populate groomed lists of different types of discovered findings and their levels of criticalness according to different institutions, departments, or radiologists).
Klassen teaches a fine-grained finding descriptor data structure, core findings, and modifiers, however Klassen does not teach
wherein the fine-grained finding descriptor data structure comprises 
data portions specifying a value of a core finding which corresponds to the core finding instance, 
a finding type indicating a type of the core finding modified by the one or more modifiers, 
a negativity indicator indicating whether or not the core finding modified by the one or more modifiers is negatively indicated in the portion of medical report natural language content, 
and the one or more modifiers of the core finding.
Rajan teaches
data portions specifying a value (Rajan [0037] Next, key measurement names are selected indicating aortic stenosis, such as peak velocity, mean pressure gradient, and aortic valve area. Using their values ranges and units, a measurement name-value pair detector is developed. As the spoken utterances of these names vary in echocardiograms, n-gram analysis is performed of a corpus of over 50,000 reports in a data collection to identify all such significant variants of the measurement names. To detect occurrences of measurement names and their associated values within the context of a detected sentence, the pattern of their occurrences in a sentence is analyzed using part-of-speech (POS) tagging, and dependency graph parsing. For each root concept (e.g., ‘gradient’), a chain of its modifiers (in the form of nouns or adjectives, e.g., ‘mean trans aortic’) are automatically identified from a sentence using an automatic POS tagger. In some embodiments, the automatic POS tagger comprises the Stanford POS tagger. By analyzing thousands of sentences containing the occurrences of measurement vocabulary terms in connection with measurement values and units, regular expression patterns are formed, such a pattern custom-characterAcustom-charactercustom-characterBcustom-charactercustom-characterCcustom-character where A is any disease indicating phrase A: {aorta, aortic, AV, AS}, B is any measurement term {gradient, velocity, area}, and C is no negation terms of the kind {no, not, without, neither, none}. Once the pattern is matched, numeric values are located following the measurement names in the same sentence that were juxtaposed with names of relevant units. An example of aortic stenosis measurement extraction is illustrated in the paragraph below), 
modified by the one or more modifiers (Rajan [0037] Next, key measurement names are selected indicating aortic stenosis, such as peak velocity, mean pressure gradient, and aortic valve area. Using their values ranges and units, a measurement name-value pair detector is developed. As the spoken utterances of these names vary in echocardiograms, n-gram analysis is performed of a corpus of over 50,000 reports in a data collection to identify all such significant variants of the measurement names. To detect occurrences of measurement names and their associated values within the context of a detected sentence, the pattern of their occurrences in a sentence is analyzed using part-of-speech (POS) tagging, and dependency graph parsing. For each root concept (e.g., ‘gradient’), a chain of its modifiers (in the form of nouns or adjectives, e.g., ‘mean trans aortic’) are automatically identified from a sentence using an automatic POS tagger. In some embodiments, the automatic POS tagger comprises the Stanford POS tagger. By analyzing thousands of sentences containing the occurrences of measurement vocabulary terms in connection with measurement values and units, regular expression patterns are formed, such a pattern custom-characterAcustom-charactercustom-characterBcustom-charactercustom-characterCcustom-character where A is any disease indicating phrase A: {aorta, aortic, AV, AS}, B is any measurement term {gradient, velocity, area}, and C is no negation terms of the kind {no, not, without, neither, none}. Once the pattern is matched, numeric values are located following the measurement names in the same sentence that were juxtaposed with names of relevant units. An example of aortic stenosis measurement extraction is illustrated in the paragraph below), 
a negativity indicator indicating whether or not the core finding modified by the one or more modifiers is negatively indicated in the portion of medical report natural language content (Rajan [0037] Next, key measurement names are selected indicating aortic stenosis, such as peak velocity, mean pressure gradient, and aortic valve area. Using their values ranges and units, a measurement name-value pair detector is developed. As the spoken utterances of these names vary in echocardiograms, n-gram analysis is performed of a corpus of over 50,000 reports in a data collection to identify all such significant variants of the measurement names. To detect occurrences of measurement names and their associated values within the context of a detected sentence, the pattern of their occurrences in a sentence is analyzed using part-of-speech (POS) tagging, and dependency graph parsing. For each root concept (e.g., ‘gradient’), a chain of its modifiers (in the form of nouns or adjectives, e.g., ‘mean trans aortic’) are automatically identified from a sentence using an automatic POS tagger. In some embodiments, the automatic POS tagger comprises the Stanford POS tagger. By analyzing thousands of sentences containing the occurrences of measurement vocabulary terms in connection with measurement values and units, regular expression patterns are formed, such a pattern custom-characterAcustom-charactercustom-characterBcustom-charactercustom-characterCcustom-character where A is any disease indicating phrase A: {aorta, aortic, AV, AS}, B is any measurement term {gradient, velocity, area}, and C is no negation terms of the kind {no, not, without, neither, none}. Once the pattern is matched, numeric values are located following the measurement names in the same sentence that were juxtaposed with names of relevant units. An example of aortic stenosis measurement extraction is illustrated in the paragraph below), 
and the one or more modifiers of the core finding (Rajan [0037] Next, key measurement names are selected indicating aortic stenosis, such as peak velocity, mean pressure gradient, and aortic valve area. Using their values ranges and units, a measurement name-value pair detector is developed. As the spoken utterances of these names vary in echocardiograms, n-gram analysis is performed of a corpus of over 50,000 reports in a data collection to identify all such significant variants of the measurement names. To detect occurrences of measurement names and their associated values within the context of a detected sentence, the pattern of their occurrences in a sentence is analyzed using part-of-speech (POS) tagging, and dependency graph parsing. For each root concept (e.g., ‘gradient’), a chain of its modifiers (in the form of nouns or adjectives, e.g., ‘mean trans aortic’) are automatically identified from a sentence using an automatic POS tagger. In some embodiments, the automatic POS tagger comprises the Stanford POS tagger. By analyzing thousands of sentences containing the occurrences of measurement vocabulary terms in connection with measurement values and units, regular expression patterns are formed, such a pattern custom-characterAcustom-charactercustom-characterBcustom-charactercustom-characterCcustom-character where A is any disease indicating phrase A: {aorta, aortic, AV, AS}, B is any measurement term {gradient, velocity, area}, and C is no negation terms of the kind {no, not, without, neither, none}. Once the pattern is matched, numeric values are located following the measurement names in the same sentence that were juxtaposed with names of relevant units. An example of aortic stenosis measurement extraction is illustrated in the paragraph below).
Rajan is considered to be analogous to the claimed invention because it is in the same field of analysis of medical imaging reports. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Klassen further in view of Rajan to allow for using a data structure with a core finding value, type, polarity, and its modifier(s). Doing so would allow for detecting discrepancies in medical reports.

Regarding claim 3, Klassen in view Of Rajan teaches the method of claim 2.
Klassen teaches core findings, core finding lexicon data structure, natural language processing, however Klassen does not teach
wherein the value of the core finding is a core finding value from the core finding lexicon data structure, 
and wherein the core finding lexicon data structure is developed through an automated or semi-automated process implementing automated computerized natural language processing computer tools to analyze and extract features from natural language content of a corpus of medical imaging report data structures indicating values of core findings.
Rajan teaches
the value (Rajan [0037] Next, key measurement names are selected indicating aortic stenosis, such as peak velocity, mean pressure gradient, and aortic valve area. Using their values ranges and units, a measurement name-value pair detector is developed. As the spoken utterances of these names vary in echocardiograms, n-gram analysis is performed of a corpus of over 50,000 reports in a data collection to identify all such significant variants of the measurement names. To detect occurrences of measurement names and their associated values within the context of a detected sentence, the pattern of their occurrences in a sentence is analyzed using part-of-speech (POS) tagging, and dependency graph parsing. For each root concept (e.g., ‘gradient’), a chain of its modifiers (in the form of nouns or adjectives, e.g., ‘mean trans aortic’) are automatically identified from a sentence using an automatic POS tagger. In some embodiments, the automatic POS tagger comprises the Stanford POS tagger. By analyzing thousands of sentences containing the occurrences of measurement vocabulary terms in connection with measurement values and units, regular expression patterns are formed, such a pattern custom-characterAcustom-charactercustom-characterBcustom-charactercustom-characterCcustom-character where A is any disease indicating phrase A: {aorta, aortic, AV, AS}, B is any measurement term {gradient, velocity, area}, and C is no negation terms of the kind {no, not, without, neither, none}. Once the pattern is matched, numeric values are located following the measurement names in the same sentence that were juxtaposed with names of relevant units. An example of aortic stenosis measurement extraction is illustrated in the paragraph below),
a corpus of medical imaging report (Rajan [0037] Next, key measurement names are selected indicating aortic stenosis, such as peak velocity, mean pressure gradient, and aortic valve area. Using their values ranges and units, a measurement name-value pair detector is developed. As the spoken utterances of these names vary in echocardiograms, n-gram analysis is performed of a corpus of over 50,000 reports in a data collection to identify all such significant variants of the measurement names. To detect occurrences of measurement names and their associated values within the context of a detected sentence, the pattern of their occurrences in a sentence is analyzed using part-of-speech (POS) tagging, and dependency graph parsing. For each root concept (e.g., ‘gradient’), a chain of its modifiers (in the form of nouns or adjectives, e.g., ‘mean trans aortic’) are automatically identified from a sentence using an automatic POS tagger. In some embodiments, the automatic POS tagger comprises the Stanford POS tagger. By analyzing thousands of sentences containing the occurrences of measurement vocabulary terms in connection with measurement values and units, regular expression patterns are formed, such a pattern custom-characterAcustom-charactercustom-characterBcustom-charactercustom-characterCcustom-character where A is any disease indicating phrase A: {aorta, aortic, AV, AS}, B is any measurement term {gradient, velocity, area}, and C is no negation terms of the kind {no, not, without, neither, none}. Once the pattern is matched, numeric values are located following the measurement names in the same sentence that were juxtaposed with names of relevant units. An example of aortic stenosis measurement extraction is illustrated in the paragraph below).
Rajan is considered to be analogous to the claimed invention because it is in the same field of analysis of medical imaging reports. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Klassen further in view of Rajan to allow for using a data structure with a core finding value and a corpus of reports. Doing so would allow for detecting discrepancies in medical reports.

Regarding claim 12, Klassen teaches the method of claim 1.
Klassen teaches fine-grained finding descriptor database, fine-grained finding labels, findings and medical imaging reports, however Klassen does not teach
wherein the downstream computer system operations comprise 
training a machine learning computer model executed on one or more computing devices, 
by performing machine learning on the machine learning computer model using the fine-grained finding descriptor database to provide fine-grained finding labels for findings in medical imaging reports.
Rajan teaches
the downstream computer system (Rajan [0095] Computer system/server 12 may also communicate with one or more external devices 14 such as a keyboard, a pointing device, a display 24, etc.; one or more devices that enable a user to interact with computer system/server 12; and/or any devices (e.g., network card, modem, etc.) that enable computer system/server 12 to communicate with one or more other computing devices. Such communication can occur via Input/Output (I/O) interfaces 22. Still yet, computer system/server 12 can communicate with one or more networks such as a local area network (LAN), a general wide area network (WAN), and/or a public network (e.g., the Internet) via network adapter 20. As depicted, network adapter 20 communicates with the other components of computer system/server 12 via bus 18. It should be understood that although not shown, other hardware and/or software components could be used in conjunction with computer system/server 12. Examples, include, but are not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data archival storage systems, etc)
training a machine learning computer model (Rajan [0045] Referring to FIG. 4, a learning phase for template generation is illustrated according to embodiments of the present disclosure. A sample image collection 401, comprising a plurality of images is read. The images are ranked 402 according to the set cover algorithm set forth below. Supersets 403 and subsets 404 of images are identified to avoid duplication and thus make annotation faster. A GUI 405 is provided that displays automatic attribute-value pair suggestions based on a rule-based approach and allows input of user corrections. Based on the corrected attribute-value pairs and their layout, a template is generated), 
performing machine learning on the machine learning computer model (Rajan [0084] In various embodiments, an incoming imaging study is processed to first select frames depicting CW Doppler pattern. A plurality of frames of a medical video are read. A mode label indicative of a mode of each of the plurality of frames is determined. A set of features discriminating between mode labels are learned using a deep learning network using a set of prior chosen training images. These features are then extracted from incoming images and classified. The images classified into CW Doppler mode labels are then retained. From among the frames depicting CW Doppler patterns, a set of frames are selected that depict the valve of interest. A region of interest is extracted in each CW Doppler image and features discriminating between different heart valves are then learned using another deep learning network using prior chosen training CW Doppler images. New CW Doppler images are classified using the learned network and those images that are classified as containing the target valve of interest are retained. A Doppler envelope is then extracted from the selected frame. Based on the frame and the Doppler envelope, one or more measurements is indicative is extracted indicative of a disease condition from those of the at least one of the plurality of frames matching a predetermined valve label).
Rajan is considered to be analogous to the claimed invention because it is in the same field of analysis of medical imaging reports. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Klassen further in view of Rajan to allow for machine learning. Doing so would allow for detecting discrepancies in medical reports.

Regarding claim 13, Klassen in view of Rajan teaches the method of claim 12.
Klassen teaches fine-grained finding labels, medical images, data structures and fine-grained finding descriptor database, however Klassen does not teach
wherein the machine learning computer model is trained to perform automated fine-grained labeling of medical image data structures representing medical images, 
based on fine-grained finding labels extracted from associated medical imaging reports and the fine-grained finding descriptor data structures of the fine-grained finding descriptor database.
Rajan teaches
machine learning computer model is trained to perform (Rajan [0045] Referring to FIG. 4, a learning phase for template generation is illustrated according to embodiments of the present disclosure. A sample image collection 401, comprising a plurality of images is read. The images are ranked 402 according to the set cover algorithm set forth below. Supersets 403 and subsets 404 of images are identified to avoid duplication and thus make annotation faster. A GUI 405 is provided that displays automatic attribute-value pair suggestions based on a rule-based approach and allows input of user corrections. Based on the corrected attribute-value pairs and their layout, a template is generated).
Rajan is considered to be analogous to the claimed invention because it is in the same field of analysis of medical imaging reports. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Klassen further in view of Rajan to allow for machine learning. Doing so would allow for detecting discrepancies in medical reports.

Claims 6-8 are rejected under 35 U.S.C. 103 as being unpatentable over Klassen in view of Syeda-Mahmood (US Patent Pub. No. 2003/0065655), hereinafter Syeda-Mahmood ‘5655, in further view of Lin et al. (US Patent Pub. No. 2020/0134032), hereinafter Lin.
Regarding claim 6, Klassen teaches the method of claim 1.
Klassen teaches a parse tree data structure, core finding lexicon data structure, however Klassen does not teach 
wherein automatically executing phrasal grouping computer operations on the parse tree data structure comprises 
grouping nodes of the parse tree data structure into one of a core phrasal group or a helper phrasal group, 
wherein nodes of a sub-tree that comprise a node corresponding to a core finding from the core finding lexicon data structure are grouped into a core phrasal group, 
and nodes that are not part of a sub-tree comprising a node corresponding to a core finding are grouped into one or more helper phrasal groups.
Syeda-Mahmood ‘5655 teaches
phrasal grouping (Syeda-Mahmood ‘5655 [0060] Specifically, the phrasal match grouper 412 uses a time threshold to group phrasal matches into individual topical audio events. The pattern of separation between individual phrasal matches can be analyzed over a number of videos and foils to derive a threshold for inter-phrasal match distance. As an illustration, inter-phrasal match distributions for phrases were recorded for more than 350 slides and a collection of more than 20 videos and the inter-phrasal match distance difference was noted during the duration over which the topic conveyed by the foil was actually discussed. The resulting distribution of the difference indicates a peak in the distribution between 1 and 20 seconds, indicating that for most speakers and most topics, the predominant separation between utterances of phrases tends to be between 1 and 20 seconds apart. Thus, a 20 second time duration was chosen as the inter-phrase match distance threshold to group phrases in the phrasal match grouper; [0061] The grouping process uses a connected component algorithm to merge adjacent phrasal matches that are within the inter-phrase match distance threshold of each other. The connected component algorithm uses a fast data structure called the union-find to perform the merging. During grouping, multiple occurrences of a match to a phrase are allowed within a group to handle cases when a phrase emphasizing a point of discussion was uttered frequently)
core group (Syeda-Mahmood ‘5655 [0016] While individual matches to phrases can be widely distributed, there exist points in time where a number of these matches either co-occur, or occur within a short span of time. If such matches can be grouped based on an inter-phrasal match distance, then it is likely that at least one such group spans the topical audio event conveyed by the slide. This represents an important observation behind combining phrasal matches to detect topical audio events. Additionally, the present invention employs a novel method of multi-modal fusion for overall topical event detection that uses a probabilistic model to exploit the time co-occurrence of individual modal events, where multiple textual phrases refer to individual modes)
helper group (Syeda-Mahmood ‘5655 [0017] Second, the top-down slide text phrases-guided topic detection indicates that a match to a phrase identifies a subtopical event and that the collection of such subtopical event matches to phrases collectively define a topical event. In addition, the word order of the query phrase is preserved throughout, to maximize accuracy).
Syeda-Mahmood ‘5655 is considered to be analogous to the claimed invention because it is in the same field of textual analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Klassen further in view of Syeda-Mahmood ‘5655 to allow for using phrasal grouping. Doing so would allow for detecting and localizing topical events, that is, the points in a recording when specific topics are discussed.
Klassen in view of Syeda-Mahmood ‘5655 does not teach
nodes
Lin teaches
nodes (Lin [0021] In some embodiments, the domain knowledge graph model and the database schema wiring model may be used to construct a structured database query language statement from a natural language question. Initially, a written language dependency parse tree may be generated from the natural language question. The parse tree may then be traversed and nodes in the parse tree may be recognized as corresponding to nodes in the domain knowledge graph. For this, the entity information of nodes in the parse tree may be leveraged together with the nodes of the domain knowledge graph to identify a question target of the natural language question. The natural language question may be determined to be understandable if all of the recognized nodes of the domain knowledge graph can be connected by one or more routes in the graph and the question target can be identified. In this case, a structured database query language statement may be constructed from the natural language question)
Lin is considered to be analogous to the claimed invention because it is in the same field of textual analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Klassen in view of Syeda-Mahmood ‘5655 further in view of Lin to allow for using nodes. Doing so would allow a natural language interface system to accurately translate domain-specific natural language questions posed by users to structured database query language statements that can be executed against a structured database to answer the natural language question.

Regarding claim 7, Klassen in view of Syeda-Mahmood ‘5655 in view of Lin teaches the method of claim 6.
Klassen does not teach, however Syeda-Mahmood ‘5655 teaches
wherein executing the phrasal grouping computer operation comprises 
merging two adjacent core phrasal groups, in the parse tree data structure (Syeda-Mahmood ‘5655 [0060] Specifically, the phrasal match grouper 412 uses a time threshold to group phrasal matches into individual topical audio events. The pattern of separation between individual phrasal matches can be analyzed over a number of videos and foils to derive a threshold for inter-phrasal match distance. As an illustration, inter-phrasal match distributions for phrases were recorded for more than 350 slides and a collection of more than 20 videos and the inter-phrasal match distance difference was noted during the duration over which the topic conveyed by the foil was actually discussed. The resulting distribution of the difference indicates a peak in the distribution between 1 and 20 seconds, indicating that for most speakers and most topics, the predominant separation between utterances of phrases tends to be between 1 and 20 seconds apart. Thus, a 20 second time duration was chosen as the inter-phrase match distance threshold to group phrases in the phrasal match grouper; [0061] The grouping process uses a connected component algorithm to merge adjacent phrasal matches that are within the inter-phrase match distance threshold of each other. The connected component algorithm uses a fast data structure called the union-find to perform the merging. During grouping, multiple occurrences of a match to a phrase are allowed within a group to handle cases when a phrase emphasizing a point of discussion was uttered frequently), 
when the two adjacent core phrasal groups are determined to contain a node corresponding to a same core finding from the core finding lexicon data structure (Syeda-Mahmood ‘5655 [0060] Specifically, the phrasal match grouper 412 uses a time threshold to group phrasal matches into individual topical audio events. The pattern of separation between individual phrasal matches can be analyzed over a number of videos and foils to derive a threshold for inter-phrasal match distance. As an illustration, inter-phrasal match distributions for phrases were recorded for more than 350 slides and a collection of more than 20 videos and the inter-phrasal match distance difference was noted during the duration over which the topic conveyed by the foil was actually discussed. The resulting distribution of the difference indicates a peak in the distribution between 1 and 20 seconds, indicating that for most speakers and most topics, the predominant separation between utterances of phrases tends to be between 1 and 20 seconds apart. Thus, a 20 second time duration was chosen as the inter-phrase match distance threshold to group phrases in the phrasal match grouper; [0061] The grouping process uses a connected component algorithm to merge adjacent phrasal matches that are within the inter-phrase match distance threshold of each other. The connected component algorithm uses a fast data structure called the union-find to perform the merging. During grouping, multiple occurrences of a match to a phrase are allowed within a group to handle cases when a phrase emphasizing a point of discussion was uttered frequently).
Syeda-Mahmood ‘5655 is considered to be analogous to the claimed invention because it is in the same field of textual analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Klassen further in view of Syeda-Mahmood ‘5655 to allow for using phrasal grouping. Doing so would allow for detecting and localizing topical events, that is, the points in a recording when specific topics are discussed.

Regarding claim 8, Klassen in view of Syeda-Mahmood ‘5655 in view of Lin teaches the method of claim 6.
Klassen teaches modifiers, core findings and parse tree data structure, however Klassen does not teach
wherein executing the phrasal grouping operation comprises 
for each helper phrasal group, associating any modifier present in the helper phrasal group with core findings in one or more core phrasal groups that are adjacent to the helper phrasal group in the parse tree data structure.
Syeda-Mahmood ‘5655 teaches
phrasal grouping (Syeda-Mahmood ‘5655 [0060] Specifically, the phrasal match grouper 412 uses a time threshold to group phrasal matches into individual topical audio events. The pattern of separation between individual phrasal matches can be analyzed over a number of videos and foils to derive a threshold for inter-phrasal match distance. As an illustration, inter-phrasal match distributions for phrases were recorded for more than 350 slides and a collection of more than 20 videos and the inter-phrasal match distance difference was noted during the duration over which the topic conveyed by the foil was actually discussed. The resulting distribution of the difference indicates a peak in the distribution between 1 and 20 seconds, indicating that for most speakers and most topics, the predominant separation between utterances of phrases tends to be between 1 and 20 seconds apart. Thus, a 20 second time duration was chosen as the inter-phrase match distance threshold to group phrases in the phrasal match grouper; [0061] The grouping process uses a connected component algorithm to merge adjacent phrasal matches that are within the inter-phrase match distance threshold of each other. The connected component algorithm uses a fast data structure called the union-find to perform the merging. During grouping, multiple occurrences of a match to a phrase are allowed within a group to handle cases when a phrase emphasizing a point of discussion was uttered frequently).
Syeda-Mahmood ‘5655 is considered to be analogous to the claimed invention because it is in the same field of textual analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Klassen further in view of Syeda-Mahmood ‘5655 to allow for using phrasal grouping. Doing so would allow for detecting and localizing topical events, that is, the points in a recording when specific topics are discussed.
Klassen in view of Syeda-Mahmood ‘5655 does not teach
adjacent 
Lin teaches
adjacent (Lin [0035] Some possible adjacency list implementations of the knowledge graph 114 include using a hash table to associate each node in the graph with an array of adjacent nodes. In this representation, a node may be represented by a hash-able node object and there may be no explicit representation of the edges as objects).
Lin is considered to be analogous to the claimed invention because it is in the same field of textual analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Klassen in view of Syeda-Mahmood ‘5655 further in view of Lin to allow for using adjacent nodes. Doing so would allow a natural language interface system to accurately translate domain-specific natural language questions posed by users to structured database query language statements that can be executed against a structured database to answer the natural language question.

Claims 9-11 are rejected under 35 U.S.C. 103 as being unpatentable over Klassen in view of Syeda-Mahmood (US Patent Pub. No. 2019/0026437), hereinafter Syeda-Mahmood ‘6437.

Regarding claim 9, Klassen teaches the method of claim of claim 1.
Klassen teaches natural language processing, core findings, parse tree data structure, however Klassen does not teach
wherein the automated computer natural language processing operations further comprises 
performing negated instance detection of core findings at least by performing a search of the parse tree data structure based on a set of known negation keywords and negation term patterns, 
and for each instance of a negation keyword or negation term pattern, determining a scope of nodes in the parse tree data structure encompassed by the instance of the negation keyword or negation term pattern, 
and wherein core findings located within a scope of nodes in the parse tree data structure encompassed by the instance of the negation keyword or negation term pattern are determined to be negatively indicated.
Syeda-Mahmmod ‘6437 teaches
performing negated instance detection of core findings at least by performing a search of the parse tree data structure based on a set of known negation keywords and negation term patterns (Syeda-Mahmmod ‘6437  [0024] Some approaches may employ a negation detection algorithm to also spot negative occurrences of diseases or symptoms. Various approaches to negation detection use regular expression patterns seeded by negation phrases that appear before or after a finding. This may be done after the UMLS phrase has already been found in the sentence and the phrase is treated en-block in the pattern. When more than one concept phrase is present in a sentence, the negation may be associated with the wrong phrase), 
and for each instance of a negation keyword or negation term pattern, determining a scope of nodes in the parse tree data structure encompassed by the instance of the negation keyword or negation term pattern (Syeda-Mahmmod ‘6437  [0024] Some approaches may employ a negation detection algorithm to also spot negative occurrences of diseases or symptoms. Various approaches to negation detection use regular expression patterns seeded by negation phrases that appear before or after a finding. This may be done after the UMLS phrase has already been found in the sentence and the phrase is treated en-block in the pattern. When more than one concept phrase is present in a sentence, the negation may be associated with the wrong phrase),
and wherein core findings located within a scope of nodes in the parse tree data structure encompassed by the instance of the negation keyword or negation term pattern are determined to be negatively indicated (Syeda-Mahmmod ‘6437  [0024] Some approaches may employ a negation detection algorithm to also spot negative occurrences of diseases or symptoms. Various approaches to negation detection use regular expression patterns seeded by negation phrases that appear before or after a finding. This may be done after the UMLS phrase has already been found in the sentence and the phrase is treated en-block in the pattern. When more than one concept phrase is present in a sentence, the negation may be associated with the wrong phrase).
Syeda-Mahmood ‘6437 is considered to be analogous to the claimed invention because it is in the same field of analysis of medical reports. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Klassen further in view of Syeda-Mahmood ‘6437 to allow for using negation detection. Doing so would allow for spotting negative occurrences of diseases or symptoms.

Regarding claim 10, Klassen in view of Syeda-Mahmood ‘6437 teaches the method of claim 9.
Klassen teaches parse tree data structure, core findings, and natural language text, however Klassen does not teach
wherein performing negated instance detection further comprises 
retrieving a set of negation prior terms and negation post terms, 
and searching the parse tree data structure for instances of nodes having terms matching terms in one or more of the set of negation prior terms or negation post terms, 
and wherein a core finding located in a node corresponding to a portion of natural language text corresponding to an instance of a term matching a term in one or more of the set of negation prior terms or negation post terms, is determined to be negatively indicated.
Syeda-Mahmmod ‘6437 teaches
wherein performing negated instance detection further comprises 
retrieving a set of negation prior terms and negation post terms (Syeda-Mahmmod ‘6437 (Syeda-Mahmmod ‘6437  [0024] Some approaches may employ a negation detection algorithm to also spot negative occurrences of diseases or symptoms. Various approaches to negation detection use regular expression patterns seeded by negation phrases that appear before or after a finding. This may be done after the UMLS phrase has already been found in the sentence and the phrase is treated en-block in the pattern. When more than one concept phrase is present in a sentence, the negation may be associated with the wrong phrase)), 
and searching the parse tree data structure for instances of nodes having terms matching terms in one or more of the set of negation prior terms or negation post terms, 
and wherein a core finding located in a node corresponding to a portion of natural language text corresponding to an instance of a term matching a term in one or more of the set of negation prior terms or negation post terms, is determined to be negatively indicated.
Syeda-Mahmood ‘6437 is considered to be analogous to the claimed invention because it is in the same field of analysis of medical reports. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Klassen further in view of Syeda-Mahmood ‘6437 to allow for using negation detection. Doing so would allow for spotting negative occurrences of diseases or symptoms.

Regarding claim 11, Klassen in view of Syeda-Mahmood ‘6437 teaches the method of claim 10.
Klassen teaches fine-grained finding descriptor data structure and fine-grained finding, however Klassen does not teach
wherein generating the fine-grained finding descriptor data structure comprises 
setting a negation indicator in the fine-grained finding descriptor data structure to indicate the fine-grained finding to be negatively indicated in response to the negated instance detection indicating that the core finding is negatively indicated.
Syeda-Mahmood ‘6437 teaches
wherein generating the fine-grained finding descriptor data structure comprises 
setting a negation indicator in the fine-grained finding descriptor data structure to indicate the fine-grained finding to be negatively indicated in response to the negated instance detection indicating that the core finding is negatively indicated (Syeda-Mahmood ‘6437 [0065] At 508, negations are flagged based on negation cues and dependency parsing as described above).
Syeda-Mahmood ‘6437 is considered to be analogous to the claimed invention because it is in the same field of analysis of medical reports. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Klassen further in view of Syeda-Mahmood ‘6437 to allow for using negation detection. Doing so would allow for spotting negative occurrences of diseases or symptoms.


Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Klassen in view of Syeda-Mahmood ‘6437 in view of Syeda-Mahmood ‘5655.

Regarding claim 15, Klassen teaches the method of claim 1.
Klassen teaches fine-grained finding descriptor data structure and fine-grained finding descriptor database, however Klassen does not teach
further comprising: 
determining, for each given instance in a set of different instances of a fine-grained finding descriptor data structure, 
a count of a number of other instances of fine-grained finding descriptor data structures that match the given instance; 
and comparing the counts for each given instance to a threshold value to select a subset of given instances of fine-grained finding descriptor data structures for inclusion in the fine-grained finding descriptor database, 
wherein only given instances of fine-grained finding descriptor data structures whose counts equal or exceed the threshold value are stored in the fine-grained finding descriptor database.
Syeda-Mahmmod ‘6437 teaches
Smith teaches counting instances, comparing to threshold, and saving if above threshold ([0001])
count (Syeda-Mahmmod ‘6437 [0054] In the above formula, the histogram counts what fraction of the must-have vocabulary words find an exact match in some single sentence within a report. Using platforms such as Lucene, the exact lookup may be automatically enabled by querying the index with the must-have terms of the given vocabulary phrase. In fact, using such a Lucene index, the most likely sentences can be determined for using the detailed LCF matching within the selected reports D.sub.R for the concept S.sub.i as those sentences T.sub.R=U.sub.l=1.sup.|D.sup.R.sup.|T.sub.l in which the must-have prefixes found a match, i.e., T.sub.lk, .t.∃t.sub.lkj∈T.sub.lk∧w.sub.ij=p.sub.m(t.sub.lkj). Using the same threshold F as used in the LCF algorithm ensures that the subsequent LCF matching is bound by the same threshold);
threshold (Syeda-Mahmmod ‘6437 [0054] In the above formula, the histogram counts what fraction of the must-have vocabulary words find an exact match in some single sentence within a report. Using platforms such as Lucene, the exact lookup may be automatically enabled by querying the index with the must-have terms of the given vocabulary phrase. In fact, using such a Lucene index, the most likely sentences can be determined for using the detailed LCF matching within the selected reports D.sub.R for the concept S.sub.i as those sentences T.sub.R=U.sub.l=1.sup.|D.sup.R.sup.|T.sub.l in which the must-have prefixes found a match, i.e., T.sub.lk, .t.∃t.sub.lkj∈T.sub.lk∧w.sub.ij=p.sub.m(t.sub.lkj). Using the same threshold F as used in the LCF algorithm ensures that the subsequent LCF matching is bound by the same threshold);
Syeda-Mahmood ‘6437 is considered to be analogous to the claimed invention because it is in the same field of analysis of medical reports. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Klassen further in view of Syeda-Mahmood ‘6437 to allow for using counting and thresholds. Doing so would allow for spotting negative occurrences of diseases or symptoms.
Klassen further in view of Syeda-Mahmood ‘6437 does not teach
wherein only given instances of fine-grained finding descriptor data structures whose counts equal or exceed the threshold value are stored in the fine-grained finding descriptor database.
Syeda-Mahmood ‘5655 teaches
using a threshold value to determine what data to process (Syeda-Mahmood ‘5655 [0061] The grouping process uses a connected component algorithm to merge adjacent phrasal matches that are within the inter-phrase match distance threshold of each other. The connected component algorithm uses a fast data structure called the union-find to perform the merging. During grouping, multiple occurrences of a match to a phrase are allowed within a group to handle cases when a phrase emphasizing a point of discussion was uttered frequently).
Syeda-Mahmood ‘5655 is considered to be analogous to the claimed invention because it is in the same field of textual analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Klassen in view of Syeda-Mahmood ‘6437 further in view of Syeda-Mahmood ‘5655 to allow for using phrasal grouping. Doing so would allow for detecting and localizing topical events, that is, the points in a recording when specific topics are discussed.

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Klassen in view of Glottmann et al. (US Patent Pub. No. 2020/0043600), hereinafter Glottmann.

Regarding claim 16, Klassen teaches the method of claim 1.
Klassen teaches
and wherein the at least one medical imaging report data structure is a medical imaging report specifying 
indications, findings, and impression of the at least one medical image generated by a subject matter expert (Klassen [0031] The textual content of the sections of the report are then analyzed by natural language processing (NLP) 98 which may, for example, employ a sequence of a tokenizer that identifies white space to break text into individual tokens (typically words or numbers), a grammatical parser that parses the tokens into grammatical units (e.g. sentences) and parts-of-speech (e.g. nouns, verbs, adjectives, and so forth), a phrase structure parser, dependency parser, named entity recognizer, semantic role labeler, and/or other NLP token grouping/classification. In one approach, an ensemble of rule-based and ML components are trained based on labeled and unlabeled clinical text collected from past (i.e. historical) radiology reports and curated to create an overall model of clinical report text. Elements of the natural language processing may include words, sentences, part-of-speech labels, and other basic linguistic information, as well as higher level semantic structures that identify units of text as values of vocabulary categories such as findings, diagnosis, follow-up recommendations, and other elements of discourse found in clinical text. The parsed text is then searched to identify vocabulary category values of a vocabulary of categories 64. The vocabulary of categories is a closed set of categories. These are categories of medical language commonly used in medical reports, such as (by way of non-limiting illustrative example): “finding”, “critical finding”, “recommendation”, “biopsy sample”, “reason for exam”, “diagnosis”, “impression”, and “observation”, and/or other vocabulary categories. A vocabulary category is instantiated in a particular medical report by a value for that category, e.g. a possible value for the “diagnosis” vocabulary category could be “prostate cancer”. The vocabulary could be hierarchical whereby all values belonging to a sub-category necessarily also belong to a higher-up category, e.g. all values that belong to the (sub-)category “critical diagnosis” also belong to the higher-up category “diagnosis”. Likewise, a value belonging to the (sub-)category “follow-up recommendation” necessarily also belongs to the higher-up category “recommendation”. There may also be some overlap between categories of the vocabulary, e.g. terms that fall within the “impression” category may also fall into the “finding” category. Alternatively, the vocabulary of categories may be designed to be mutually exclusive so that there is no such overlap).
Klassen does not teach
wherein the at least one medical image is at least one of a human chest radiology image.
Glottmann teaches
wherein the at least one medical image is at least one of a human chest radiology image (Glottmann [0041] More particularly, the image processing unit 121 may dispatch images to one or more computer vision (CV) servers 125 configured to run convolutional neural networks (CNN) or other computer vision techniques. Images may be dispatched by the image processing unit 121 to the appropriate computer vision (CV) server 125 based on their modality (e.g., CT, MRI) and body region (e.g., chest, brain). Prior to applying convolutional neural networks by way of the CV server 125, the image processing unit 121 may preprocess the image data. Examples of preprocessing image data include normalizing the pixel information in the image data, the size of voxels, and/or the size of the data provided. Further, preprocessing image data may include segmenting different images. The image processing unit 121 may also post process results from the convolutional neural network algorithms. In particular, post processing steps may include undoing normalizations to return images to normal sizes. Post processing may also include data cleaning, adjacent component/morphology analysis, and contour operations).
Glottmann is considered to be analogous to the claimed invention because it is in the same field of medical imaging reports. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Klassen further in view of Glottmann to allow for processing a medical imaging report of a chest. Doing so would allow for automated analysis of radiological information, such as medical images and related text statements for discrepancy analysis.

Allowable Subject Matter

Claims 4, 5 and 14 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim 4 would be allowable because the prior art does not teach
and wherein the entries for synonyms of core findings point to corresponding core findings in the core finding lexicon data structure; 
for matching with text such that a portion of text containing a prefix associated with a core finding is determined to be specifying the core finding in the portion of text; 
and processing the medical report natural language content based on the index data structure and core findings prefix data structure to identify instances of core findings, instances of synonyms of core findings, and instances of prefixes of core findings in the at least one medical imaging report data structure, to thereby generate the set of core finding instances.

Claim 5 is dependent on claim 4 and would therefore be allowable.

Claim 14 would be allowable because the prior art does not teach
and wherein generating the fine-grained finding descriptor data structure for the core finding instance based on the association of one or more modifiers of the core finding with the core finding instance further comprises 
combining the one or more modifiers and core finding instance with the core finding type corresponding to the core finding instance from the core finding lexicon data structure, to thereby generate a fine-grained finding descriptor data structure.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL J. MUELLER whose telephone number is (571)272-1875. The examiner can normally be reached M-F 8:30am-5:30pm (Eastern).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on 571-272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PAUL J. MUELLER/Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657