DETAILED ACTION
This action is in response to the reply received 12/21/2021. After consideration of applicant's amendments and/or remarks:
Examiner withdraws rejections under 35 USC § 112.
Claims 1-7 and 9-15 rejected under 35 USC § 103.


Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over McMurry, Andrew et al., "Improved De-Identification of Physician Notes Through Integrative Modeling of Both Public and Private Medical Text," BMC Medical Informatics and Decision Making, BioMed Central, London, GB, Vol.12, No. 1, October 2, 2013 (cited in IDS dated 2/8/21, hereinafter McMurry), in view of Sun, U.S. PG-Publication No. 2011/0213777 A1.

Claim 1
a method of word filtering implemented on a system having one or more processors and memories. McMurry discloses a "Scrubber" algorithm for filtering Protected Health Information (PHI) words from physician notes. McMurry, 2. The algorithm is implemented "on a single computer using commodity hardware." Id. at 9.
	McMurry discloses receiving a plurality of documents. The algorithm receives/stores a corpus comprising discharge summaries. Id. at 4.
	McMurry discloses receiving a domain dictionary. The algorithm receives/stores ten vocabularies in the Unified Medical Language System comprising demographic terms, diagnoses, lab tests, medication names, and procedures. Id. at 7.
	McMurry discloses generating, by the one or more processors, a set of tokens for each of the plurality of documents, each token representing a segment in the document. McMurry discloses that each "document instance is split into fragments" and "each fragment is matched against dictionaries of controlled medical vocabularies." Id. at 3. Specifically, the algorithmic ULMS features "refer to the number of matches for each token or phrase in ten medical dictionaries." Id. at 5. 
	McMurry discloses separating, by the one or more processors, the set of tokens into a subset of dictionary tokens and a subset of non-dictionary tokens, wherein each of the subset of dictionary tokens is in the domain dictionary, and wherein each of the subset of non-dictionary tokens is not in the domain dictionary; and generating, by the one or more processors, a set of … tokens, wherein the set of … tokens comprises the subset of dictionary tokens and the subset of … non-dictionary tokens. McMurry discloses that the medical dictionaries features of the algorithm are implemented using "a decision tree" trained to distinguish PHI (i.e. subset of non-dictionary tokens) from medical concepts (i.e. subset of dictionary tokens). Id. at 7; See Also 
	McMurry discloses generating, by the one or more processors, a language model using the set of filtered tokens. McMurry discloses using a training set "compiled using all of the PHI words and an equally sized random selection of non-PHI words" to "train a model to recognize non-PHI words and phrases that appear in public medical texts." McMurry, 4, Abstract; See Also 9 (describing the model as a "single token model").
	McMurry suggests that "[w]ords in physician notes that frequently appear in medical journal publications and concept dictionaries are highly unlikely to contain PHI." However, McMurry does not expressly disclose: filtering, by the one or more processors, the subset of non-dictionary tokens to produce a subset of filtered non-dictionary tokens, wherein each of the filtered non-dictionary tokens has an occurrence frequency greater than a predefined threshold; and generating, by the one or more processors, a set of filtered tokens, wherein the set of filtered tokens comprises the subset of dictionary tokens and the subset of filtered non-dictionary tokens.
	Sun discloses filtering, by the one or more processors, the subset of non-dictionary tokens to produce a subset of filtered non-dictionary tokens, wherein each of the filtered non-dictionary tokens has an occurrence frequency greater than a predefined threshold; and generating, by the one or more processors, a set of filtered tokens, wherein the set of filtered tokens comprises the subset of dictionary tokens and the subset of filtered non-dictionary tokens.
	Sun discloses methods of text categorization and classification, used to "filter … one or more words to provide one or more words that meet filtering conditions." Sun, ¶¶ 6-7. A filter module 106 "filters … various parsed words obtained from parsing to obtain words that meet the filtering conditions." Id. at ¶ 39. In one embodiment, the filter module "can filter words based on Id. at ¶ 63.
	 It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the text classification method for classifying PHI and non-PHI tokens of McMurry to incorporate text classification using a text frequency threshold as taught by Sun. One of ordinary skill in the art would be motivated to integrate the text frequency threshold into McMurry, with a reasonable expectation of success, in order to increase accuracy of an automatic text classification system. See Sun, ¶ 3.

Claim 9
	Claim 9 recites a system configured to perform the steps of the method recited in claim 1. Accordingly, claim 9 is rejected as indicated in the rejection of claim 1.


Claims 2-7 and 10-15 are rejected under 35 U.S.C. 103 as being unpatentable over McMurry, in view of Sun, further in view of Benton, Adrian, et al. "A system for de-identifying medical message board text." BMC bioinformatics 12, no. 3 (2011): 1-10 (hereinafter Benton).

Claim 2
	Benton discloses identifying, by the one or more processors, a source of each of the plurality of documents. Benton discloses a method for de-identification of documents on medical Id. at 3-4. Accordingly, an MMB corpus comprises a plurality of messages (i.e. a plurality of documents), and Benton identifies a source (e.g. message board and/or person) for each of the plurality of messages.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of filtering tokens using term frequency of McMurry-Sun to incorporate the term frequency metric taught by Benton. One of ordinary skill in the art would be motivated to integrate the term frequency metric into McMurry-Sun, with a reasonable expectation of success, in order to increase the accuracy of identifying personal information (e.g. names) in a corpus of noisy, unstructured text. See Benton, 1 (describing previous methods as ineffective at capturing identifiers in unstructured, noisy text).

Claim 3
	Benton discloses identifying step comprises employing a matching algorithm to identify the source of each document. Benton discloses identifying person names using algorithms that "take advantage of the face that MMB posts often being by addressing another user and end with the author's name" as well as generating a username word list "by extracting text in the author field from each message post." Benton, 3.

Claim 4
	Benton discloses wherein the matching algorithm comprises a person matching algorithm. Benton discloses identifying person names using algorithms that "take advantage of the face that MMB posts often being by addressing another user and end with the author's name" as well as generating a username word list "by extracting text in the author field from each message post." Benton, 3.

Claim 5
	Benton discloses wherein the occurrence frequency is determined based on source-distinct documents. An MMB corpus is segmented such that "all threads within a particular message board" or "all messages posted by a particular user" as separate documents. Benton calculates term frequency "by treating all messages that belong to a particular message board as a document" and "by treating all messages that a particular user posts as a document." Benton discloses N is the total number of documents within the corpus, and ni is the total number of documents that contain at least one instance of token i. Benton, 3-4. Accordingly, the messages are segmented such that each source (e.g. board or authoring user) is treated as a single document.

Claim 6
	Benton discloses wherein two source-distinct documents have different sources from each other. An MMB corpus is segmented such that "all threads within a particular message board" or "all messages posted by a particular user" as separate documents. Benton calculates term frequency "by treating all messages that belong to a particular message board as a 

Claim 7
	Benton discloses wherein the occurrence frequency of a token is determined to be based on a number of source-distinct documents having the token. Benton calculates term frequency "by treating all messages that belong to a particular message board as a document" and "by treating all messages that a particular user posts as a document." Benton discloses N is the total number of documents within the corpus, and ni is the total number of documents that contain at least one instance of token i. Benton, 3-4. By treating all messages sources to a particular board or authoring user as a single document, the term frequency is based on the number of source-distinct documents having the term.

Claims 10-15
	Claims 10-15 recite a system configured to perform the steps of the method recited in claims 2-7. Accordingly, claims 10-15 are rejected as indicated in the rejection of claims 2-7.


Response to Arguments
Applicant's arguments filed 12/21/2021 have been fully considered but they are not persuasive.
model trained only using filtered tokens) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Applicant argues that McMurry does not teach or suggest the limitation "generating … a language model using the set of filtered tokens" because McMurry trains a language model using Protected Health Information (PHI) tokens (i.e. filtered tokens) and a random selection of non-PHI words. Rem. 5-6.
Applicant argues that the model is trained only using filtered tokens, but the claim does not say 'generate a language model using only the set of filtered tokens.' The claim merely recites a step of generating a language model using the set of filtered tokens. McMurry clearly trains a language model using filtered tokens (PHI tokens) in addition to a random selection of non-filtered tokens (non-PHI words).
Accordingly, the rejection under 35 USC § 103 is maintained.


Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FRANK D MILLS whose telephone number is (571)270-3172. The examiner can normally be reached M-F 10-6 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KAVITA PADMANABHAN can be reached on (571)272-8352. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/FRANK D MILLS/Primary Examiner, Art Unit 2176                                                                                                                                                                                                        March 21, 2022