Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Applicant's Response
In Applicant's Response dated 1/31/2022, Applicant amended the Claims and argued against all objections and rejections set forth in the previous Office Action.
All objections and rejections not reproduced below are withdrawn. 
The prior art rejection of the Claims under 35 U.S.C. 103 previously set forth are withdrawn. 
The examiner appreciates the applicant noting where the support for the amendments is located in the specification. 
The Application was filed on 4/28/2020.
Claim(s) 1-4, 7-13, 16-20 are pending for examination. Claim(s) 1, 10, 19 is/are independent claim(s).

Examiners Interpretation of Claim(s) 19-20: 
Claim(s) 19-20 is/are interpreted as being statutory. For purposes of USC 101, the examiner has regarded the “computer readable storage media” as not including transmission media, as described in the specification (see ¶ [00116]). 

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):


The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claim 1-4, 7-13, 16-20 rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Claims 1, 10, 19 recite: 
“removing each infrequent term from the processed text”
There is insufficient support for this limitation in the specification. The specification discloses “filtering out the set of infrequent terms” (published specification: [0002-04, 14-16]), and “For each infrequent term/token selected for pruning, the term filter unit 450 is configured to prune (i.e., filter out) the infrequent term/token from the processed text.” (Published specification: [0070-72]). 
The specification does disclose “removing PII from text documents” and “concealing (e.g., masking, removing, or suppressing)” (published specification: [0019, 
The specification discloses “filtering” out of pruning infrequent terms for processed text, but not “removing” infrequent terms from processed text. 
In some instances “filtering out” and “removing” may be the same thing, but not in all instances and the specification does not provide support for interchange the terms when describing “infrequent terms”. 
The claims should be amended back to the original “filtering” language. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 2, 10, 11, 19, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mane; Shishir US Pub. No. 2020/0320167 (Mane) in view of Poblenz; Eric C. et al. US Pub. No. 2020/0160969 (Poblenz) in view of Bitan; Elan et al. US Pub. No. 2011/0055185 (Bitan).

Claim 1: 
	Mane teaches: 
A method for utility-preserving text de-identification [abstract] (advanced document redaction, redaction is “de-identification”), comprising:
for each text document of a set of text documents, generating corresponding processed text by:
applying at least one natural language processor (NLP) annotator to the text document to recognize and tag privacy-sensitive personal information corresponding to an individual [¶ 0033, 60-61] (NLP to tag parts of speech); and
replacing at least one word in the text document with at least one replacement value, wherein the at least one word includes a portion of the privacy-sensitive personal information [¶ 0047-48] (replace name with random character and age with a different number); and 
for each processed text corresponding to each text document of the set of text documents: 
determining a union of terms appearing in the processed text, wherein the union of terms excludes privacy-sensitive personal information tagged in the processed text [¶ 0004] (private information) [¶ 0019, 26, 30-33, 51-61] (candidate generator) [¶ 0055-56] (identify all unique words, “Name” is not redacted because it is part of the ignore keywords list);
determining which terms of the union of terms are infrequent [¶ 0058] (frequency of terms across documents and not frequency of terms within the document) [¶ 0032, 52-56] (unique words and less frequent words);
removing each infrequent term from the processed text [¶ 0032, 52-56] (unique words are redacted which is “filtering out”, or are added to the candidate list);
for each infrequent term removed from the processed text:
determining whether the infrequent term is innocuous based on a whitelist [¶ 0045-46] (ignore keywords database is a “whitelist”) [¶ 0055-56] (identify all unique words, “Name” is not redacted because it is part of the ignore keywords list); and
reinstating the infrequent term to the processed text in response to determining the infrequent term is innocuous [¶ 0045-45, 54-55] (keywords database of words to ignore, whitelist) [¶ 0051-52] (candidate generator considers both unique and less frequent words, when a word is considered but not redacted it is “reinstating” that word, which may be an infrequent word); and
generating a corresponding de-identified text document by anonymizing the privacy-sensitive personal information tagged in the processed text to an extent that preserves data utility of the processed text and conceals a personal identity of an individual that the privacy-sensitive personal information tagged in the processed text corresponds to [¶ 0038-39, 41-42] (replacement engine for replacing text, words, phrases) [¶ 0047-48] (replace name with random character and age with a different number).

	Mane teaches all the elements of the claim as shown above. However these elements appear to be taught in different embodiments of Mane. 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the different embodiments in Mane, with a reasonable expectation of success. 


	Mane teaches, but Poblenz teach a grey list in more depth:
for each processed text corresponding to each text document of the set of text documents: 
determining a union of terms appearing in the processed text, wherein the union of terms excludes privacy-sensitive personal information recognized and tagged in the text document [¶ 0021] (global gray list are terms that can exclude PPI, includes ambiguous terms that could be patient identifying terms in some contexts, gray list term set can be updated based on blacklist and/or whitelist term sets for a particular medical scan and corresponding medical report);

	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the method of document redaction in Mane and the method of protecting information in Poblenz, with a reasonable expectation of success. 
	The motivation for this combination would have been to “protect the confidentiality of the patient” [Poblenz: ¶ 0145].

Mane teaches, but Bitan also teaches:
for each infrequent term removed from the processed text:
determining whether the infrequent term is innocuous based on a whitelist [¶ 0045-46] (ignore keywords database is a “whitelist”) [¶ 0336-351] (method for processing multiple words, unique word calculation, then a few steps later ignore words list); 

	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the method of document redaction in Mane and the method of protecting information in Poblenz and the method of information retrieval in Bitan, with a reasonable expectation of success. 
	The motivation for this combination would have been for “improvement to the search process for information” [Bitan: ¶ 0002].

Claim 2: 
	Mane teaches: 
The method of claim 1, wherein each privacy-sensitive personal information corresponding to each individual comprises at least one of: a direct identifier, or a quasi- identifier [¶ 0047-48] (replace name with random character and age with a different number, name and age are both a “direct identifier”).

Claims 10, 11, 19, 20: 
Claim(s) 10, 19 is/are substantially similar to Claim 1 and are rejected using the same art and the same rationale as Claim 1. 

Claim(s) 11, 20 is/are substantially similar to Claim 2 and are rejected using the same art and the same rationale as Claim 2. 

Claim(s) 3, 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mane; Shishir US Pub. No. 2020/0320167 (Mane) in view of Poblenz; Eric C. et al. US Pub. No. 2020/0160969 (Poblenz) in view of Bitan; Elan et al. US Pub. No. 2011/0055185 (Bitan) in view of Aravamudan; Murali et al. US Pub. No. 20200402625 (Aravamudan).
Claim 3: 
	Mane teaches all the elements as shown above, including tagging words for part of speech (POS) [¶ 0033, 60-61] (NLP to tag parts of speech): 
	Mane, Poblenz, Bitan fail to teach, but Aravamudan teaches: 
The method of claim 2, wherein generating corresponding processed text further comprises:
tagging each identifier recognized in the text document with a set of tags indicative of a type of the identifier [¶ 0187, 198-205] (entity tagging using natural language processing);
masking an original value of each direct identifier recognized in the text document with a masked value [¶ 0180-182] (information masking); and
tagging each quasi-identifier recognized in the text document with another set of tags indicative of a type of the quasi-identifier [¶ 0316] (date of birth and disease is a quasi-identifier).

	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the method of document redaction in Mane and the method of protecting information in Poblenz and the method of information retrieval in Bitan the method of computing with private data in Aravamudan, with a reasonable expectation of success. 
	The motivation for this combination would have been to “be more efficient and achieve an improved speed of processing”, “improve efficiency”, and improve performance [Aravamudan: ¶ 0107, 187].

Claim 12: 
Claim(s) 12 is/are substantially similar to Claim 3 and are rejected using the same art and the same rationale as Claim 3.

Claim(s) 4-8, 13-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mane; Shishir US Pub. No. 2020/0320167 (Mane) in view of Poblenz; Eric C. et al. US Pub. No. 2020/0160969 (Poblenz) in view of Bitan; Elan et al. US Pub. No. 2011/0055185 (Bitan) in view of Aravamudan; Murali et al. US Pub. No. 20200402625 (Aravamudan) in view of Norgeot; Beau et al. US Pub. No. 2020/0394333 (Norgeot).
Claim 4: 

The method of claim 3, further comprising:
for each processed text corresponding to each text document of the set of text documents:
determining, for each term of the union of terms appearing in the processed text, a corresponding term frequency indicative of a number of text documents from the set of text documents that contain the term [¶ 0058] (frequency of terms across documents and not frequency of terms within the document) [¶ 0032, 52-56] (unique words and less frequent words together are a “union of terms”); and
… , wherein each infrequent term has a corresponding term frequency that is less than the frequency threshold [¶ 0032, 52-56] (ratio between the number of documents in which the term occurs and the total number of documents, 0.1 equals 10%, less than 10% is a “threshold”).

	Mane, Poblenz, Bitan, Aravamudan fails to teach, but Norgeot teaches: 
	… determining which terms of the union of terms are infrequent based on each term frequency corresponding to each term of the union of terms and a frequency threshold derived from a blacklist [¶ 0058, 73, 86] (blacklist, threshold quantity of recall errors and/or precision errors) 

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the method of document redaction in Mane and the method of protecting information in Poblenz and the method 
	The motivation for this combination would have been to improve security [Norgeot: ¶ 0046].

Claim 6: 
	Mane teaches: 
The method of claim 4, further comprising: 
for each processed text corresponding to each text document of the set of text documents:
for each infrequent term filtered out of the processed text [¶ 0055-56] (identify all unique words, “Name” is not redacted because it is part of the ignore keywords list),
determining whether the infrequent term is innocuous based on a whitelist [¶ 0045-46] (ignore keywords database is a “whitelist”), and
reinstating the infrequent term to the processed text in response to determining the infrequent term is innocuous [¶ 0051-52] (candidate generator considers both unique and less frequent words, when a word is considered but not redacted it is “selectively reinstating” that word).
Norgeot also teaches: [¶ 0057, 74-75] (apply a common words whitelist to identify safe tokens; whitelist may include tokens including common medical terms, common medical abbreviations, common English verbs with conjugations) [¶ 0056-57, 64, 77-78, 
Aravamudan teaches: [¶ 0197] (whitelist tagger, identify common or repeated phrases that are commonly identified as not containing PHI, whitelist sentences can be removed from the data, prevalence count for each unique sentence may be computed and identified based on the highest count). 

Claim 7: 
	Norgeot teaches: 
The method of claim 6, further comprising:
for each processed text, generating one or more corresponding records of structured data, wherein each corresponding record maintains one or more original values [¶ 0065] (filter for protected health information, tokens for date of birth and postal code)
for one or more quasi-identifiers tagged in the processed text, and the one or more quasi- identifiers are included in a list of known quasi-identifiers [¶ 0065] (filter for protected health information, tokens for date of birth and postal code in the library, this would be a “list”).

Claim 8: 
	Mane teaches: 
The method of claim 7, wherein anonymizing privacy-sensitive personal information included in the processed text comprises:
anonymizing one or more original values maintained in one or more corresponding records, and replacing the one or more original values in the processed text with one or more resulting anonymized values [¶ 0038-39, 41-42] (replacement engine for replacing text, words, phrases) [¶ 0047-48] (replace name with random character and age with a different number).

Claims 13, 15-17: 
Claim(s) 13 is/are substantially similar to Claim 4 and are rejected using the same art and the same rationale as Claim 4.
Claim(s) 15 is/are substantially similar to Claim 6 and are rejected using the same art and the same rationale as Claim 6.
Claim(s) 16 is/are substantially similar to Claim 7 and are rejected using the same art and the same rationale as Claim 7.
Claim(s) 17 is/are substantially similar to Claim 8 and are rejected using the same art and the same rationale as Claim 8.

Claim(s) 9, 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mane; Shishir US Pub. No. 2020/0320167 (Mane) in view of Poblenz; Eric C. et al. US Pub. No. 2020/0160969 (Poblenz) in view of Bitan; Elan et al. US Pub. No. 2011/0055185 (Bitan) in view of Aravamudan; Murali et al. US Pub. No. 20200402625 (Aravamudan) in view of Norgeot; Beau et al. US Pub. No. 2020/0394333 (Norgeot) in view of Redlich; Ron M. et al. US Pub. No. 2009/0254572 (Redlich).
Claim 9: 

Mane, Poblenz, Bitan, Aravamudan, Norgeot fails to teach, but Redlich teaches: 
The method of claim 8, wherein anonymizing privacy-sensitive personal information included in the processed text further comprises:
for each identifier tagged in the processed text, removing each tag that the identifier is tagged with from the processed text [¶ 1162, 2984, 3070] (tag removal).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the method of document redaction in Mane and the method of protecting information in Poblenz and the method of information retrieval in Bitan and the method of computing with private data in Aravamudan and the method of de-identification in Norgeot and the method of providing information infrastructure in Redlich, with a reasonable expectation of success. 
	The motivation for this combination would have been to improve “reliability, replicability, survivability, integrity, validity, security, and preclude forced inferential reconstitution activities” [Redlich: ¶ 0978].

Claim 18: 
Claim(s) 18 is/are substantially similar to Claim 9 and are rejected using the same art and the same rationale as Claim 9.

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Please See PTO-892: Notice of References Cited.

Evidence of the level skill of an ordinary person in the art for Claim 1: 
Austin; Joseph et al. US 20190236310 teaches: [¶ 0019, 21, 29, 32, 34] (customizable whitelist of terms that may provide false positives for PHI or PII). 

Evidence of the level skill of an ordinary person in the art for Claim 3: 
Sweeney; Latanya US 7269578 teaches: de-identifying, frequency list, quasi-identifier. 

Evidence of the level skill of an ordinary person in the art for Claim 4: 
Broudou; Julian et al. US 20160321582 teaches: replace personal information contained therein with generic or randomized personal information; [¶ 0089] blacklist, for a rule to be triggered, a predetermined threshold must be exceeded or otherwise not be satisfied. 
Jakobsson; Bjorn Markus US 20210182111 teaches: anonymize data, blacklist of words never to include, threshold used.
Scott; Alan et al. US 9846716 teaches: De-identification of production data; whitelist, blacklist, frequency. 

Stockdale; Jack et al. US 20190260784 teaches: anonymized data; black list can identify data fields to be rendered anonymous or a white list can identify data fields that can be open to the public. 

Evidence of the level skill of an ordinary person in the art for orginal Claim 6, now part of Claim 1: 
Thakurta; Abhradeep Guha et al. US Pub. No. 2017/0359364 (Thakurta) teaches: [¶ 0033, 38] (blacklist for terms, frequency server, terms not sent to frequency server) [¶ 0096, 110] (frequency threshold, count minimum), [¶ 0093] (de-identification). 
HACHEY; Benjamin Clayton US 20210256160 teaches: [¶ 0074] (A whitelist generating module is provided that is configured to generate whitelists 212 based on frequency of words in a large dataset).

Response to Arguments
Applicant's arguments filed 1/31/2022 have been fully considered but they are not persuasive.

35 USC 103 Rejection: 
The applicant argues that: 
Mane, however, does not disclose removing a word from the documents and reinstating the word removed from the documents to the documents in response 
The examiner respectfully disagrees. 

Fist, as sated above the specification does not support “removing each infrequent term from the processed text”. The specification discloses “filtering” out of pruning infrequent terms for processed text, but not “removing” infrequent terms from processed text. 
Second, Mane teaches removing and reinstating words based on frequency and a whitelist. 
Mane teaches: 
[0055] FIG. 6A illustrates an exemplary replacement process 600 with a dictionary approach, according to one embodiment. Candidates generator 615 identifies all unique words from document 610 as candidates for redaction 620 (e.g., Account, Name, Age, Ramesh Suresh, Yrs.) The term ‘Name:’ isn't redacted and replaced because it is part of the model training exercise and hence part of the ignore keywords list 412. Replacement engine 625 generates redacted document 630 with replacement words from the dictionary of replacements 411. Also we can note that the replacements are consistent across occurrences (e.g. the replacement for Account is same on both occasions.)

[0056] FIG. 6B illustrates an exemplary replacement process 650 with random character replacement, according to one embodiment. Candidates generator 675 identifies all unique words from document 670 as candidates for redaction 680 (e.g., Account, Name, Age, Ramesh Suresh, Yrs.) Replacement engine 685 generates redacted document 690 with random character replacements for the redaction candidates 680. The term ‘Name:’ isn't redacted and replaced because it is part of the model training exercise and hence part of the ignore keywords list. The replacements are consistent across occurrences (e.g. the replacement for Account is same on both occasions.)

 and
The candidate generator considers both unique and less frequent words, when a word is considered but not redacted it is “reinstating” that word, which may be an infrequent word [¶ 0051-52]. 
Third, this may be taught in Hachey, cited above in the prior art section. As well as by Norgeot and Aravamudan (see below).  
Norgeot also teaches: [¶ 0057, 74-75] (apply a common words whitelist to identify safe tokens; whitelist may include tokens including common medical terms, common medical abbreviations, common English verbs with conjugations) [¶ 0056-57, 64, 77-78, 89-90] (safe filter, set token status to “safe”) [0074] (unclassified token, or tokens not classified as protected personal information, no decision has been made).
Aravamudan teaches: [¶ 0197] (whitelist tagger, identify common or repeated phrases that are commonly identified as not containing PHI, whitelist sentences can be removed from the data, prevalence count for each unique sentence may be computed and identified based on the highest count). 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BENJAMIN J SMITH whose telephone number is (571)270-3825.  The examiner can normally be reached on Monday - Friday 11:00 - 7:30 EST.
Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Scott Baderman can be reached on (571)272-3644.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/Benjamin Smith/Examiner, Art Unit 2144                                                                                                                                                                                                        Direct Phone: 571-270-3825
Direct Fax: 571-270-4825
Email: benjamin.smith@uspto.gov