Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
The following is a Final Office action. In response to communications received 5/23/2022, Applicant, on 10/21/2022 amended claims, 1,8 and 15-20. Claim 21 is new. Claims 1-21 remain pending in this application and have been rejected below.
Response to Amendment
Applicant’s amendments and arguments have been considered. Accordingly, the 101 rejection is hereby removed.
 Applicant’s amendments and arguments have been considered. However, the 103 rejection remains and is updated below.
Response to Argument
With respect to the 101 arguments, Applicant argues that the amendments of the “claimed subject matter involves specific technologies and training methods, and the use of machine learning models that result in the claimed subject matter performing a specific function, which is to ultimate generate a revised data file labeling each of the plurality of fields of demographic information based on the identified type of demographic information and to insert missing demographics information based on identified demographic information” (See Remarks at pg. 11). For this reason, Applicant further asserts that the claimed subject matter is integrated into a practical application (See Remarks at pg. 11). Examiner agrees. Examiner notes that although the machine learning model and training method is generically claimed and can be identified as a generic mathematical calculation, the specific function of “inserting, by the processor and in the revised data file, missing fields of demographic information based on the identified type of demographic information,” resulting from the training and execution of the machine learning model, is rooted in computer technology and applies the judicial exception in some other meaningful way beyond generally linking the exception to a particular technological environment. Therefore, the 101 rejection is hereby removed.

With respect to the 103 arguments, Applicant argues that Wu et al. (US Patent Application Publication, 2019/0287685, hereinafter referred to as Wu) in view of Sorensen. (US Patent Application Publication, 2020/0004765) fail to disclose the amended claimed subject matter. However, Examiner notes that this argument is now moot, as the 103 rejection is now rejected by Wu et al. (US Patent Application Publication, 2019/0287685, hereinafter referred to as Wu) in view of Sorensen. (US Patent Application Publication, 2020/0004765) in further view of Akbulut et al. (US Patent Application Publication, 2016/0283350, hereinafter referred to as Akbulut). See the updated rejection below.
Continuation

This application is a continuation application of U.S. application no. 16/668,565 filed on 10/30/2019 (“Parent Application”).  See MPEP §201.07.  In accordance with MPEP §609.02 A. 2 and MPEP §2001.06(b) (last paragraph), the Examiner has reviewed and considered the prior art cited in the Parent Application.  Also in accordance with MPEP §2001.06(b) (last paragraph), all documents cited or considered ‘of record’ in the Parent Application are now considered cited or ‘of record’ in this application.  Additionally, Applicant(s) are reminded that a listing of the information cited or ‘of record’ in the Parent Application need not be resubmitted in this application unless Applicants desire the information to be printed on a patent issuing from this application.  See MPEP §609.02 A. 2.  Finally, Applicants are reminded that the prosecution history of the Parent Application is relevant in this application.  See e.g., Microsoft Corp. v. Multi-Tech Sys., Inc., 357 F.3d 1340, 1350, 69 USPQ2d 1815, 1823 (Fed. Cir. 2004) (holding that statements made in prosecution of one patent are relevant to the scope of all sibling patents).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim 1-21 are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al. (US Patent Application Publication, 2019/0287685, hereinafter referred to as Wu) in view of Sorensen. (US Patent Application Publication, 2020/0004765) in further view of Akbulut et al. (US Patent Application Publication, 2016/0283350, hereinafter referred to as Akbulut).

As per Claim 1, Wu discloses a computer-implemented method of identifying demographic information in a data file, comprising: 
a)	training a machine learning model according to labeled… based at least in part on structure and content of a data file from a data source with information describing medical providers, the machine learning model being based on a plurality of machine learning algorithms to identify different types demographic information (Wu: ¶0003, 0024, 0031, 0038-0040, 0044-0045: Electronic medical records containing demographic information and a patient’s history of present illness (HPI’s) are analyzed. An HPI classifier receives unprocessed medical records for preprocessing by a natural language processor. A tokenizer converts each word or group of words in the medical record into a token. The tokenizer can analyze the shape of words or phrases using simple rules to develop a token (which represents a type of demographic information, such as name, location, date, etc.). A tensor generator vectorizes each token and inputs them into the machine learning model (e.g., the neural network). The neural network outputs a vector including values corresponding to the presence of each HPI classification/label element in an input. The neural network is trained using the preprocessed HPI(s) and HPI classification(s) (e.g., collectively referred to as the samples). See ¶0045 where the neural network can be a part of a larger neural network [model based on a plurality of machine learning algorithms]. See ¶0082 for the model trained with a training set. See also ¶0003 where the demographic information can describe the healthcare provider.).
b)	receiving, by the processor, the data file containing a plurality of fields of demographic information from… the data file having inconsistent or mislabeled nomenclatures with one another for one or more fields of the plurality of fields or spurious demographic information (Wu: ¶0003, 0024, 0038-0040 and 0049: Electronic medical records containing demographic information and a patient’s history of present illness (HPI’s) are collected from a third-party source (see ¶0049) and analyzed. An HPI classifier receives unprocessed medical records for preprocessing by a natural language processor. A tokenizer converts each word or group of words in the medical record into a token. The tokenizer can analyze the shape of words or phrases (i.e. metadata) using simple rules to develop a token (which represents a type of demographic information, such as name, location, date, etc.).  The tokenizer can tokenize names and titles together (e.g., "Dr. Smith") and/or certain medical abbreviations/nomenclatures (e.g., "obstructive sleep apnea," "cardiac arrest," and "Type 2 diabetes.") and even nomenclatures that have different meanings depending on context (i.e. “pt” can refer to “patient” or “physical therapy”).).
c)	analyzing, by the processor… a plurality of strings representing a field of demographic information in the data files using the machine learning model (Wu: ¶0003, 0024, 0031, 0038-0040, 0044-0045: Electronic medical records containing demographic information and a patient’s history of present illness (HPI’s) are analyzed. An HPI classifier receives unprocessed medical records for preprocessing by a natural language processor. A tokenizer converts each word or group of words in the medical record into a token by breaking down the HPI input strings. The tokenizer can analyze the shape of words or phrases using simple rules to develop a token (which represents a type of demographic information, such as name, location, date, etc.). A tensor generator vectorizes each token and inputs them into the machine learning model (e.g., the neural network). The neural network outputs a vector including values corresponding to the presence of each HPI classification/label element in an input. The neural network is trained using the preprocessed HPI(s) and HPI classification(s) (e.g., collectively referred to as the samples). See ¶0045 where the neural network can be a part of a larger neural network [model based on a plurality of machine learning algorithms]. See ¶0082 for the model trained with a training set.).
e)	generating, by the processor, a revised data file labeling each of the plurality of fields of demographic information based on the identified type (Wu: ¶0003, 0031, 0049, 0062-0068: A machine learning model can be trained to validate its correct operation of processing and generating a classification/label of medical records (i.e. demographic information or medical history). A model evaluator can monitor and evaluate third-party medical records (i.e. demographic information or medical history) for misclassification. Accordingly, a medical system can transmit a corrective action (i.e. generate, update, or delete medical record) as a feedback notification to the data source (i.e. third party).);  

f)	[identifying], by the processor and in the revised data file, missing fields of demographic information based on the identified type of demographic information (Wu: ¶0024: History of present illness information may contain missing or incomplete words and sentences. See ¶0074 where HPI may include incomplete or null demographic information, such as names, dates, etc.).

Wu does not explicitly disclose; however, Sorensen discloses:
a)	sampled training sets to identify a heading (Sorensen: ¶0044-0045 and 0050: The training of the machine learning model samples large sets of training data and processed when new unstructured data is applied. See specific example in ¶0050 where a portion of the unstructured text is extracted and identifies a heading or signature.);
b)	a plurality of third-party sources (Sorensen: ¶0011 and 0026-0029: Multiple data sources including external data sources may be used to retrieve metadata regarding demographic information of an identifying person.);

c) analyzing, at the processor, a heading identifying a field of demographic information in the data file using a machine learning model trained (Sorensen: ¶0045-0046, 0049-0050 and 0079-0080: A trained machine learning model may be applied to unstructured data to identify one or more pieces of contact information, whereas contact information includes demographic information such as complete person name, a postal address, a job title, email address, etc. See example in ¶0050 where unstructured text is extracted from a heading and is analyzed for identified demographic information. See ¶0030 where each piece of string is analyzed in the unstructured data. Also, see Fig. 1B for the heading identifying field.),
d)	generating, at the processor, a score indicating a probability that each of the plurality of fields of demographic information was identified correctly (Sorensen: Fig. 1B and ¶0012: The text portions may be processed to identify potential segments of text that may be converted to structured text. For example, the text portions include person details such as names, contact information, and the like. Each segment of text may be assigned classification metadata and a confidence score indicating the likelihood (represented as a probability), based on previous training, that the classification is correct.);  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Wu with Sorensen’s machine learning model for classifying an individual’s data because the references are analogous/compatible, since each is directed toward features of predicting erroneous data in demographic records, and because incorporating G Sorensen’s machine learning model for classifying an individual’s data in Wu would have served Wu’s pursuit of monitoring data for misclassifications and determining a probability of misclassification (See Wu, ¶0049, 0064 and 0093); and further obvious since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.
Wu does not explicitly disclose; however, Akbulut discloses:
f)	inserting, by the processor and in the revised data file, missing fields of …information (Akbulut: ¶0014: Data analysis program analyzes data to identify and provide data for missing and inaccurate structured content.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Wu with Akbulut’s analysis of unstructured data to identify structured content because the references are analogous/compatible, since each is directed toward features of predicting erroneous unstructured data in demographic records, and because incorporating Akbulut’s analysis of unstructured data to identify structured content in Wu would have served Wu’s pursuit of effectively updating or generating a medical record with correct classification (See Wu, ¶0071 and 0076); and further obvious since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.
Claims 8 and 15 recite limitations already addressed by the rejection of Claim 1; therefore, the same rejection applies. Also, Wu discloses, in at least ¶0054, implementing the method on a system with machine readable instructions embodied in software stored on a non-transitory computer readable storage medium and executed on a processor. See also, ¶0038-0039, where demographic information, such as names, are identified in an HPI file.
As per Claim 2, Wu in view of Sorensen in further view of Akbulut discloses the method of claim 1, wherein analyzing the data file comprises analyzing semantic content of each of the plurality of fields of demographic information to identify the different types of demographic information (Wu: ¶0003, 0024 and 0038-0040: Electronic medical records containing demographic information and a patient’s history of present illness (HPI’s) are analyzed. An HPI is recorded by a different healthcare professional, the writing style (e.g., punctuation, abbreviations, word choice, sentence structure, etc.) of each narrative can vary. The HPI’s often contain high occurrences of medical terms, abbreviations and named entities which often have different meanings depending on the context. For example, "pt" can refer to either "patient" or "physical therapy" depending on the context. Also, HPIs often contain extensive use of numbers with different semantic meanings. For examples, the phrases "last colonoscopy was 2009," "the pain lasts 5 minutes," and "Type 2 Diabetes" all contain numbers with different semantic meanings (e.g., a date, a duration and a classification of disease, respectively). To properly classify the medical records and HPI information, an HPI classifier receives unprocessed medical records for preprocessing by a natural language processor. A tokenizer converts each word or group of words in the medical record into a token. The tokenizer can analyze the shape of words or phrases using simple rules to develop a token (which represents a type of demographic information, such as name, location, date, etc.).
Claims 9 and 16 recite limitations already addressed by the rejection of Claim 2; therefore, the same rejection applies. 
As per Claim 3, Wu in view of Sorensen in further view of Akbulut discloses the method of claim 1, wherein analyzing the data file comprises analyzing a shape of each of the plurality of fields of demographic information to identify the different types of demographic information (Wu: ¶0003, 0024 and 0038-0040: Electronic medical records containing demographic information and a patient’s history of present illness (HPI’s) are analyzed. An HPI classifier receives unprocessed medical records for preprocessing by a natural language processor. A tokenizer converts each word or group of words in the medical record into a token. The tokenizer can analyze the shape of words or phrases using simple rules to develop a token (which represents a type of demographic information, such as name, location, date, etc.). The named entity recognizer scans the tokenized records for numbers, dates, named entities, medical terms, abbreviations, and/or misspelling and replaces these elements with standardized tokens. As an example, “Mar. 12, 2018” is replaced with three tokens representing month, day and year, namely "DATE," "DATE," and "DATE," respectively.). 
Claims 10 and 17 recite limitations already addressed by the rejection of Claim 3; therefore, the same rejection applies.
As per Claim 4, Wu in view of Sorensen in further view of Akbulut discloses the method of claim 1, wherein analyzing the data file comprises analyzing metadata of each of the plurality of fields of demographic information to identify the different types of demographic information (Wu: ¶0003, 0024 and 0038-0040: Electronic medical records containing demographic information and a patient’s history of present illness (HPI’s) are analyzed. An HPI classifier receives unprocessed medical records for preprocessing by a natural language processor. A tokenizer converts each word or group of words in the medical record into a token. The tokenizer can analyze the shape of words or phrases (i.e. metadata) using simple rules to develop a token (which represents a type of demographic information, such as name, location, date, etc.). The named entity recognizer scans the tokenized records for numbers, dates, named entities, medical terms, abbreviations, and/or misspelling and replaces these elements with standardized tokens.).  
Claims 11 and 18 recite limitations already addressed by the rejection of Claim 4; therefore, the same rejection applies.

As per Claim 5, Wu in view of Sorensen in further view of Akbulut discloses the method of claim 4, wherein the metadata includes each nomenclature of each of the plurality of fields of demographic information (Wu: ¶0003, 0024 and 0038-0040: Electronic medical records containing demographic information and a patient’s history of present illness (HPI’s) are analyzed. An HPI classifier receives unprocessed medical records for preprocessing by a natural language processor. A tokenizer converts each word or group of words in the medical record into a token. The tokenizer can analyze the shape of words or phrases (i.e. metadata) using simple rules to develop a token (which represents a type of demographic information, such as name, location, date, etc.).  The tokenizer can tokenize names and titles together (e.g., "Dr. Smith") and/or certain medical abbreviations/nomenclatures (e.g., "obstructive sleep apnea," "cardiac arrest," and "Type 2 diabetes.") and even nomenclatures that have different meanings depending on context (i.e. “pt” can refer to “patient” or “physical therapy”)).  
Claims 12 and 19 recite limitations already addressed by the rejection of Claim 5; therefore, the same rejection applies.
As per Claim 6, Wu in view of Sorensen in further view of Akbulut discloses the method of claim 1, wherein, in response to identifying different ones of the plurality of fields of demographic information, the method further comprises cross-checking at least one of the plurality of fields of demographic information against known demographic information (Wu: ¶0003, 0024, 0038-0040, 0044-0045, 0062: Electronic medical records containing demographic information and a patient’s history of present illness (HPI’s) are analyzed. An HPI classifier receives unprocessed medical records for preprocessing by a natural language processor. A tokenizer converts each word or group of words in the medical record into a token. The tokenizer can analyze the shape of words or phrases (i.e. metadata) using simple rules to develop a token (which represents a type of demographic information, such as name, location, date, etc.).  A tensor generator vectorizes each token and inputs them into the machine learning model (e.g., the neural network). The neural network is trained using the preprocessed HPI(s) and HPI classification(s) (e.g., collectively referred to as the samples). The samples are divided such that are some of the samples are used for training and some are used for validation (e.g., cross-checking to confirm the model works after training). Known outcomes/results can be used to verify performance of the training model, which can also be validated with a test data set. In some examples, a set of known, "gold standard", or other reference data can be divided into a training data set to train the model and a test data set to test the trained network model to validate its correct operation.). 
Claims 13 and 20 recite limitations already addressed by the rejection of Claim 6; therefore, the same rejection applies.

As per Claim 7, Wu in view of Sorensen in further view of Akbulut discloses the method of claim 1, further comprising transmitting the revised data file to the third-party (Wu: ¶0003 and 0049, 0062-0068: A machine learning model can be trained to validate its correct operation. A model evaluator can monitor and evaluate third-party medical records (i.e. demographic information or medical history) for misclassification. Accordingly, a medical system can transmit a corrective action (i.e. generate, update, or delete medical record) as a feedback notification to the data source (i.e. third party).).

As per Claim 21, Wu in view of Sorensen in further view of Akbulut discloses the method of claim 1.
Wu does not explicitly disclose; however, Sorensen discloses wherein the analyzing comprises distinguishing between each of the plurality of fields of demographic information based on a position of the heading in the data files with respect to another position of another heading in the data files (Sorensen: ¶0050-0081: The machine learning model may analyze unstructured text extracted from a heading (See example in ¶0050-0078). Tokens in a trained machine learning system may be used to disambiguate text portions to which multiple metadata labels have been assigned by analyzing their positions. As an example, the machine learning system may determine that "Louis" is properly considered as part of an address due to its proximity to a state abbreviation and a known ZIP code rather than as part of the name that is significantly "farther away" in the unstructured text.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Wu with Sorensen’s machine learning model for classifying an individual’s data because the references are analogous/compatible, since each is directed toward features of predicting erroneous data in demographic records, and because incorporating G Sorensen’s machine learning model for classifying an individual’s data in Wu would have served Wu’s pursuit of monitoring data for misclassifications and determining a probability of misclassification (See Wu, ¶0049, 0064 and 0093); and further obvious since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Sengupta et al. (US 2012/0053959): Operations, such as data processing operations, can be improved by applying clustering and statistical techniques to observed behaviors in the data processing operations.

Gauthier et al. (US 10,963,378): In a computer-implemented method, an artificial neural network is trained to identify conversation segments, and/or segment portions, within electronic communication documents (e.g., emails). An input layer of the neural network includes input parameters corresponding to different characteristics of text-based content. A first electronic communication document is received, and its text-based content is processed using the trained neural network to generate one or more position indicators for the document. The position indicators include one or more segment indicators denoting positions of one or more conversation segments within the document, and/or one or more segment portion indicators denoting positions of one or more portions of one or more conversation segments within the document. An ordered relationship between the first electronic communication document and one or more other electronic communication documents is determined using the position indicators.

Corr (US 2019/0113905): One variation of a method for normalizing manufacturing data includes: identifying a type of the digital component description document, including a set of part descriptor entries describing a set of physical parts, based on categories of physical parts described in part descriptor entries within the digital component description document; accessing a set of part descriptor rules corresponding to the type of the digital component description document; detecting deviation of a part descriptor entry, in the set of part descriptor entries, from a part descriptor rule in the set of part descriptor rules; at the user portal, serving a prompt to manually correct the part descriptor entry; and, in response to alignment between the digital component description document and the set of part descriptor rules, importing the set of part descriptor entries into the component library.

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALLISON MICHELLE NEAL whose telephone number is (571)272-9334. The examiner can normally be reached 9-5pm ET, M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Brian Epstein can be reached on 571-270-5389. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

ALLISON MICHELLE NEAL
Examiner
Art Unit 3683



/TIMOTHY PADOT/             Primary Examiner, Art Unit 3683