DETAILED ACTION
IDS filed 2/13/2020 contains references not considered.
Claim 18 objected to for minor informalities.
Claims 1-20 rejected under 35 USC § 103.


Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Information Disclosure Statement
	The information disclosure statement filed February 13, 2020 fails to comply with the provisions of 37 CFR 1.97, 1.98 and MPEP § 609 because: Citation 6- no date, no place of publication; Citation 8- no date; Citation 9- no date, no place of publication; Citation 10- no date, Citation 11- no place of publication, Citation 12- no date.  It has been placed in the application file, but the information referred to therein has not been considered as to the merits.  Applicant is advised that the date of any re-submission of any item of information contained in this information disclosure statement or the submission of any missing element(s) will be the date of submission for purposes of determining compliance with the requirements based on the time of filing the statement, including all certification requirements for statements under 37 CFR 1.97(e).  See MPEP § 609.05(a).


Claim Objections
	Claim 18 objected to because of the following informalities:  Claim 18 should be dependent on claim 17, because there is no antecedent basis for "the second statistics information" and "the second sub-sequence of words."  Appropriate correction is required.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 7-9, 13, and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Medalion et al., U.S. PG-Publication No. 2021/0125615 A1, in view of Rose et al., U.S. PG-Publication No. 2021/0226953 A1.

Claim 1
	Medalion discloses a system, comprising: a non-transitory memory; and one or more hardware processors coupled with the non-transitory memory and configured to read instructions from the non-transitory memory to cause the system to perform operations. Medalion discloses a "method for detecting personally identifiable information" (PII) from a plurality of text strings; the method is implemented using computer hardware and memory. Medalion, ¶¶ 7-9.
comprising: receiving a request to remove sensitive data from text data comprising a sequence of words. Medalion illustrates "example method 400 for detecting personally identifiable information." At 402, the method receives "a plurality of text strings." At 404, the text strings are provided "to a bidirectional long short-term memory (BiLSTM) neural network model." At 406-408, the BiLSTM model predicts text data elements (in the plurality of text strings) that comprise PII; and redacts the predicted PII elements "from the plurality of text strings to form redacted text strings." Id. at ¶¶ 96-111. Accordingly, sending the text strings (step 404) to a neural network for redacting PII is a request to remove PII from a plurality of text strings.
Medalion discloses accessing a word corpus comprising … data determining whether a first sub-sequence of words within the sequence of words includes sensitive data. Medalion discloses determining whether a text data element comprises PII "based on a forward context and a backward context associated with the respective text data elements." Id. at ¶¶ 100-101. The BiLSTM is "configured to predict (1) whether a string of text … contains PII by looking at the full sentence; (2) whether a specific term (or token) contains PII based on the context of the term; and (3) whether the specific term contains PII based on the term itself." Id. at ¶ 36 (sentence, term context, and term itself are all examples of sub-sequence of words). Further, the BiLSTM is trained "on labeled datasets including PII to identify the PII both directly and by context." Id. at ¶ 34. Training data collection 126 "comprises data for training machine learning model(s) 108" and "may include text-based transcripts of support sessions with users … which have been labeled based on the presence of PII." Id. at ¶ 65 (training data 126 is analogous to claimed word corpus).
in response to determining that the first sub-sequence of words includes sensitive data, removing the first sub-sequence of words from the text data. At 408, the method "redact[s] one or more text data elements comprising the predicted [PII] from the plurality of text strings to form redacted text strings." Id. at ¶ 103. Redaction involves "deleting the predicted PII text." Id. at ¶ 61.
Medalion does not expressly disclose accessing a word corpus comprising public data determining whether a first sub-sequence of words within the sequence of words includes sensitive data based on statistics associated with the first sub-sequence of words within the word corpus.
Rose discloses accessing a word corpus comprising public data determining whether a first sub-sequence of words within the sequence of words includes sensitive data based on statistics associated with the first sub-sequence of words within the word corpus. Rose discloses a system for "identifying sensitive information in communications." Rose, ¶ 10. In one embodiment, artificial intelligence is "used to identify specified or sensitive event-related words and phrases in … electronic communications and publicly accessible data that can subsequently be used to classify and tag current electronic communications." Id. at ¶ 12. The system comprises "a classification module 130 … configured to identify sensitive event-related words and phrases in an electronic communication." Id. at ¶ 18. The machine-learning algorithm is trained with "public data containing emails and communications." Id. at ¶ 22. The classification module "may utilize a term frequency-inverse document frequency (TF-IDF) algorithm to identify words or phrase patterns indicating sensitive information; wherein the TF-IDF algorithm "determined the frequency of occurrence of terms occur within a single document and also the frequency of Id. at ¶¶ 25; 36 (wherein term frequency is a statistic associated with words or phrase patterns).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of redacting sensitive words or phrases in communications using machine learning of Medalion to incorporate an AI to identify sensitive words or phrases trained using public data and implementing term frequency as taught by Rose. One of ordinary skill in the art would be motivated to integrate public data training and term frequency into Medalion, with a reasonable expectation of success, in order to "dynamically update and apply controls to sensitive information that may not have been sensitive historically but is sensitive in a current time." Rose, ¶ 10.

Claim 2
	Medalion discloses wherein the operations further comprise storing the text data in a data storage after removing the first sub-sequence of words from the text data. At 410, the redacted text strings are provided to a data repository. Medalion, ¶ 104. Specifically, the method generates redacted text strings, "which are sent back to production data collection 122 to replace the un-redacted text strings that include PII." Id. at ¶ 91.

Claim 7
	Rose discloses wherein the statistics comprises a number of times for which the first sub-sequence of words appears in the word corpus. The classification module "may utilize a term frequency-inverse document frequency (TF-IDF) algorithm to identify words or phrase patterns indicating sensitive information; wherein the TF-IDF algorithm "determined the frequency of 

Claim 8
Medalion discloses a method, comprising: receiving, by one or more hardware processors, a request to remove sensitive data from text data comprising a sequence of words. Medalion discloses a "method for detecting personally identifiable information" (PII) from a plurality of text strings; the method is implemented using computer hardware and memory. Medalion, ¶¶ 7-9. Medalion illustrates "example method 400 for detecting personally identifiable information." At 402, the method receives "a plurality of text strings." At 404, the text strings are provided "to a bidirectional long short-term memory (BiLSTM) neural network model." At 406-408, the BiLSTM model predicts text data elements (in the plurality of text strings) that comprise PII; and redacts the predicted PII elements "from the plurality of text strings to form redacted text strings." Id. at ¶¶ 96-111. Accordingly, sending the text strings (step 404) to a neural network for redacting PII is a request to remove PII from a plurality of text strings.
Medalion discloses identifying, by the one or more hardware processors, a first sub-sequence of words from the sequence of words. Medalion discloses determining whether a text data element comprises PII "based on a forward context and a backward context associated with the respective text data elements." Id. at ¶¶ 100-101. The BiLSTM is "configured to predict (1) whether a string of text … contains PII by looking at the full sentence; (2) whether a specific term (or token) contains PII based on the context of the term; and (3) whether the specific term contains PII based on the term itself." Id. at ¶ 36 (sentence, term context, and term itself are all Id. at ¶ 34. Training data collection 126 "comprises data for training machine learning model(s) 108" and "may include text-based transcripts of support sessions with users … which have been labeled based on the presence of PII." Id. at ¶ 65 (training data 126 is analogous to claimed word corpus).
Medalion discloses determining, by the one or more hardware processors, that the first sub-sequence of words includes sensitive data. Medalion discloses determining whether a text data element comprises PII "based on a forward context and a backward context associated with the respective text data elements." Id. at ¶¶ 100-101. The BiLSTM is "configured to predict (1) whether a string of text … contains PII by looking at the full sentence; (2) whether a specific term (or token) contains PII based on the context of the term; and (3) whether the specific term contains PII based on the term itself." Id. at ¶ 36 (sentence, term context, and term itself are all examples of sub-sequence of words). Further, the BiLSTM is trained "on labeled datasets including PII to identify the PII both directly and by context." Id. at ¶ 34. Training data collection 126 "comprises data for training machine learning model(s) 108" and "may include text-based transcripts of support sessions with users … which have been labeled based on the presence of PII." Id. at ¶ 65 (training data 126 is analogous to claimed word corpus).
	Medalion discloses in response to determining that the first sub-sequence of words includes sensitive data, removing, by the one or more hardware processors, the first sub-sequence of words from the text data. At 408, the method "redact[s] one or more text data elements comprising the predicted [PII] from the plurality of text strings to form redacted text strings." Id. at ¶ 103. Redaction involves "deleting the predicted PII text." Id. at ¶ 61.
determining, by the one or more hardware processors, whether first statistical information associated with the first sub-sequence of words within a word corpus exceeds a threshold, wherein the word corpus comprises words and word sequences associated with non-sensitive data; and determining, by the one or more hardware processors, that the first sub-sequence of words includes sensitive data based at least in part on the first statistical information not exceeding the threshold.
Rose discloses determining, by the one or more hardware processors, whether first statistical information associated with the first sub-sequence of words within a word corpus exceeds a threshold, wherein the word corpus comprises words and word sequences associated with non-sensitive data. Rose discloses a system for "identifying sensitive information in communications." Rose, ¶ 10. In one embodiment, artificial intelligence is "used to identify specified or sensitive event-related words and phrases in … electronic communications and publicly accessible data that can subsequently be used to classify and tag current electronic communications." Id. at ¶ 12. The system comprises "a classification module 130 … configured to identify sensitive event-related words and phrases in an electronic communication." Id. at ¶ 18. The machine-learning algorithm is trained with "public data containing emails and communications." Id. at ¶ 22. The classification module "may utilize a term frequency-inverse document frequency (TF-IDF) algorithm to identify words or phrase patterns indicating sensitive information; wherein the TF-IDF algorithm "determined the frequency of occurrence of terms occur within a single document and also the frequency of occurrence of the terms within all the other documents." Id. at ¶¶ 25; 36 (wherein term frequency is a statistic associated with words or phrase patterns).
based at least in part on the first statistical information not exceeding the threshold. The classification module implements a "configurable threshold based on, for example a number of percentage of matching words or phrases or both." The classification module may "assign a confidence score" indicating "a degree of confidence that the tagged communication includes the type of sensitive information indicated by the tag." Id. at ¶¶ 28-31. The method will block or permit sending of tagged communications "based on … either the threshold of matching words and phrases or the conference score or both." Id. at ¶ 33 (wherein the sending is blocked because the communication has identified sensitive information). Rose discloses that the method "may block the electronic transmission from being transmitted" based on a "confidence score" exceeding "a specified threshold." Id. at ¶ 45. One of ordinary skill in the art recognizes that the inverse of the confidence score could be used to block communications scoring below (i.e. not exceeding) the specified threshold (i.e. obvious mathematical variation). This would allow communications scoring above the specified threshold (i.e. communications not including sensitive data).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of redacting sensitive words or phrases in communications using machine learning of Medalion to incorporate an AI to identify sensitive words or phrases trained using public data and implementing term frequency as taught by Rose. One of ordinary skill in the art would be motivated to integrate public data training and term frequency into Medalion, with a reasonable expectation of success, in order to "dynamically update and apply controls to sensitive information that may not have been sensitive historically but is sensitive in a current time." Rose, ¶ 10.

Claim 9
	Rose discloses wherein the statistics information comprises a number of different documents in which the first sub-sequence of words appears in the word corpus. The classification module "may utilize a term frequency-inverse document frequency (TF-IDF) algorithm to identify words or phrase patterns indicating sensitive information; wherein the TF-IDF algorithm "determined the frequency of occurrence of terms occur within a single document and also the frequency of occurrence of the terms within all the other documents." Rose, ¶¶ 25; 36 (wherein term frequency is a statistic associated with words or phrase patterns).

Claim 13
	Medalion discloses wherein the text data comprises a transcript of a communication with a user. Medalion discloses that he method removes PII from text transcriptions of "support phone calls" or a "text-based support session, such as a live chat." Medalion, ¶¶ 82-85.

Claim 15
	Medalion discloses a non-transitory machine-readable medium having stored thereon machine- readable instructions executable to cause a machine to perform operations. Medalion discloses a "method for detecting personally identifiable information" (PII) from a plurality of text strings; the method is implemented using computer hardware and memory. Medalion, ¶¶ 7-9.
	Medalion discloses operations comprising: receiving a request to remove sensitive data from text data comprising a sequence of words. Medalion illustrates "example method 400 for Id. at ¶¶ 96-111. Accordingly, sending the text strings (step 404) to a neural network for redacting PII is a request to remove PII from a plurality of text strings.
Medalion discloses accessing a word index comprising non-sensitive words and word sequences; and determining whether a first sub-sequence of words within the sequence of words includes sensitive data. Medalion discloses determining whether a text data element comprises PII "based on a forward context and a backward context associated with the respective text data elements." Id. at ¶¶ 100-101. The BiLSTM is "configured to predict (1) whether a string of text … contains PII by looking at the full sentence; (2) whether a specific term (or token) contains PII based on the context of the term; and (3) whether the specific term contains PII based on the term itself." Id. at ¶ 36 (sentence, term context, and term itself are all examples of sub-sequence of words). Further, the BiLSTM is trained "on labeled datasets including PII to identify the PII both directly and by context." Id. at ¶ 34. Training data collection 126 "comprises data for training machine learning model(s) 108" and "may include text-based transcripts of support sessions with users … which have been labeled based on the presence of PII." Id. at ¶ 65 (training data 126 is analogous to claimed word index).
	Medalion discloses in response to determining that the first sub-sequence of words includes sensitive data, removing the first sub-sequence of words from the text data. At 408, the method "redact[s] one or more text data elements comprising the predicted [PII] from the Id. at ¶ 103. Redaction involves "deleting the predicted PII text." Id. at ¶ 61.
	Medalion does not expressly disclose determining whether a first sub-sequence of words within the sequence of words includes sensitive data based on statistics information associated with the first sub-sequence of words within the word index.
Rose discloses determining whether a first sub-sequence of words within the sequence of words includes sensitive data based on statistics information associated with the first sub-sequence of words within the word index. Rose discloses a system for "identifying sensitive information in communications." Rose, ¶ 10. In one embodiment, artificial intelligence is "used to identify specified or sensitive event-related words and phrases in … electronic communications and publicly accessible data that can subsequently be used to classify and tag current electronic communications." Id. at ¶ 12. The system comprises "a classification module 130 … configured to identify sensitive event-related words and phrases in an electronic communication." Id. at ¶ 18. The machine-learning algorithm is trained with "public data containing emails and communications." Id. at ¶ 22. The classification module "may utilize a term frequency-inverse document frequency (TF-IDF) algorithm to identify words or phrase patterns indicating sensitive information; wherein the TF-IDF algorithm "determined the frequency of occurrence of terms occur within a single document and also the frequency of occurrence of the terms within all the other documents." Id. at ¶¶ 25; 36 (wherein term frequency is a statistic associated with words or phrase patterns).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of redacting sensitive words or phrases in communications using machine learning of Medalion to incorporate an AI to 

Claim 16
	Medalion discloses wherein the operations further comprise training a machine learning model based on the text data after removing the first sub-sequence of words. Medalion discloses that "the redacted text strings may be provided for other uses 308, such as training, generation of other models, further analysis, and the like." Medalion, ¶ 91.


Claims 3-6, 10-12, 14, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Medalion et al., U.S. PG-Publication No. 2021/0125615 A1, in view of Rose et al., U.S. PG-Publication No. 2021/0226953 A1, further in view of Muthusrinivasan et al., U.S. Patent No. 8,561,185 B1.

Claim 3
	Muthusrinivasan discloses wherein the operations further comprise: providing a sliding window to select the first sub-sequence of words; in response to determining that the first sub-sequence of words includes sensitive data, moving the sliding window past the first sub-sequence of words to select a second sub-sequence of words subsequent to the first sub-sequence of words in the sequence of words. Muthusrinivasan discloses "systems and methods for detecting certain types of personally identifiable information included in resource content." Muthusrinivasan, 1:65-67; See Also 12:10-35 (method implemented using computer hardware and memory). The system implements a parser 302 to access and parse a resource "for inspection by the PII and secondary content detector 304." In one embodiment, "the parser uses a sliding window to parse the content of the resource;" wherein the parser "can user a sliding window of up to n characters" or "use a sliding window of up to m words." Parser 392 "uses the sliding window by initially selecting the first n characters (or m words) and then processing the data to detect PII type information. Each adjustment of the sliding window moves the sliding window by deleting one of the selected characters (or words) and adding a next character (or word) in the sequence of characters (or words) of the content." Id. at 7:5-28.
	Muthusrinivasan discloses determining whether the second sub-sequence of words includes sensitive data based on second statistics associated with the second sub-sequence of words within the word corpus. PII system 120 "adjusts the parsed window and processed the parsed data for detection" and "determines if a numerical term length derived from the parsed data is less than a minimum threshold term length," wherein the length "is determined form the sub-portion of the content of the resource." Id. at 8:32-62. Accordingly, the parsed window (i.e. second sub-sequence of words) uses numerical term length (i.e. a statistic) to determine whether the parsed window contains PII.
	Muthusrinivasan discloses in response to determining that the second sub-sequence of words includes sensitive data, reducing a size of the sliding window to select a third sub-sequence of words consisting of a portion of the second sub-sequence of words. The system 120 determines whether a sub-portion of the parsed data is a numerical term of length greater than a Id. at 8:32-9:15; FIG. 5. Accordingly, the numeric term is a smaller sub-sequence of the parsed window (i.e. third sub-sequence of words) that is subject to further PII determination.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of identifying and removing PII from text of Medalion-Rose to incorporate sliding window techniques taught by Muthusrinivasan. One of ordinary skill in the art would be motivated to integrate sliding window techniques into Medalion-Rose, with a reasonable expectation of success, in order to increase accuracy of determining PII by using regular expressions to match strings of text, such as particular characters, words, or patterns of characters that corresponds to PII type information. See Muthusrinivasan, 6:51-63.

Claim 4
	Muthusrinivasan discloses determining whether the third sub-sequence of words within the sequence of words includes sensitive data based on statistics associated with the third sub-sequence of words within the word corpus. The system 120 determines whether a sub-portion of the parsed data is a numerical term of length greater than a threshold and comprises consecutively occurring numbers (504-508). Numerical terms meeting these requirements are 
	Rose discloses determining whether the third sub-sequence of words within the sequence of words includes sensitive data based on statistics associated with the third sub-sequence of words within the word corpus. The classification module "may utilize a term frequency-inverse document frequency (TF-IDF) algorithm to identify words or phrase patterns indicating sensitive information; wherein the TF-IDF algorithm "determined the frequency of occurrence of terms occur within a single document and also the frequency of occurrence of the terms within all the other documents." Rose, ¶¶ 25; 36 (wherein term frequency is a statistic associated with words or phrase patterns).
	Medalion discloses in response to determining that the third sub-sequence of words includes sensitive data, removing the third sub-sequence of words from the text data. At 410, the redacted text strings are provided to a data repository. Medalion, ¶ 104. Specifically, the method generates redacted text strings, "which are sent back to production data collection 122 to replace the un-redacted text strings that include PII." Id. at ¶ 91.

Claim 5
wherein the size of the sliding window is reduced by excluding a last word from the second sub-sequence of words. The system implements a parser 302 to access and parse a resource "for inspection by the PII and secondary content detector 304." In one embodiment, "the parser uses a sliding window to parse the content of the resource;" wherein the parser "can user a sliding window of up to n characters" or "use a sliding window of up to m words." Parser 392 "uses the sliding window by initially selecting the first n characters (or m words) and then processing the data to detect PII type information. Each adjustment of the sliding window moves the sliding window by deleting one of the selected characters (or words) and adding a next character (or word) in the sequence of characters (or words) of the content." Muthusrinivasan, 7:5-28.

Claim 6
	Muthusrinivasan discloses determining whether the third sub-sequence of words within the sequence of words includes sensitive data based on statistics associated with the third sub-sequence of words within the word corpus; and in response to determining that the third sub-sequence of words does not include sensitive data, retaining the third sub-sequence of words in the text data. The system 120 determines whether a sub-portion of the parsed data is a numerical term of length greater than a threshold and comprises consecutively occurring numbers (504-508). Numerical terms meeting these requirements are predicted PII and compared to known PII patterns (510). If the PII system determines that the number matches a known PII pattern (and is not test data) then "the PII system 120 determines that the information is PII type information" (514). Conversely, if "the parsed information does not constitute a pattern match, the PII system determines that the information is not PII type information" (506). Muthusrinivasan, 8:32-9:15; 

Claim 10
	Muthusrinivasan discloses wherein the first sub-sequence of words comprises a number of words, wherein the method further comprises: identifying a plurality of word sequences within the word corpus, wherein each word sequence in the plurality of word sequences includes the number of words. Muthusrinivasan discloses "systems and methods for detecting certain types of personally identifiable information included in resource content." Muthusrinivasan, 1:65-67; See Also 12:10-35 (method implemented using computer hardware and memory). The system implements a parser 302 to access and parse a resource "for inspection by the PII and secondary content detector 304." In one embodiment, "the parser uses a sliding window to parse the content of the resource;" wherein the parser "can user a sliding window of up to n characters" or "use a sliding window of up to m words." Parser 392 "uses the sliding window by initially selecting the first n characters (or m words) and then processing the data to detect PII type information. Each adjustment of the sliding window moves the sliding window by deleting one of the selected characters (or words) and adding a next character (or word) in the sequence of characters (or words) of the content." Id. at 7:5-28.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of identifying and removing PII from text of Medalion-Rose to incorporate sliding window techniques taught by Muthusrinivasan. One of ordinary skill in the art would be motivated to integrate sliding window techniques into Medalion-Rose, with a reasonable expectation of success, in order to increase 
	Rose discloses determining statistical data associated with the plurality of word sequences within the word corpus; and determining the threshold based on the statistical data. The classification module implements a "configurable threshold based on, for example a number of percentage of matching words or phrases or both." The classification module may "assign a confidence score" indicating "a degree of confidence that the tagged communication includes the type of sensitive information indicated by the tag." Id. at ¶¶ 28-31. The method will block or permit sending of tagged communications "based on … either the threshold of matching words and phrases or the conference score or both." Id. at ¶ 33 (wherein the sending is blocked because the communication has identified sensitive information).

Claim 11
	Muthusrinivasan discloses wherein the first sub-sequence of words includes three or more words. The system implements a parser 302 to access and parse a resource "for inspection by the PII and secondary content detector 304." In one embodiment, "the parser uses a sliding window to parse the content of the resource;" wherein the parser "can user a sliding window of up to n characters" or "use a sliding window of up to m words." Parser 392 "uses the sliding window by initially selecting the first n characters (or m words) and then processing the data to detect PII type information. Each adjustment of the sliding window moves the sliding window by deleting one of the selected characters (or words) and adding a next character (or word) in the , 7:5-28. Any m words include three or more words.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of identifying and removing PII from text of Medalion-Rose to incorporate sliding window techniques taught by Muthusrinivasan. One of ordinary skill in the art would be motivated to integrate sliding window techniques into Medalion-Rose, with a reasonable expectation of success, in order to increase accuracy of determining PII by using regular expressions to match strings of text, such as particular characters, words, or patterns of characters that corresponds to PII type information. See Muthusrinivasan, 6:51-63.

Claim 12
	Muthusrinivasan discloses wherein the first sub-sequence of words includes at least one of an address, a funding account number, a gender, a name, or an age. Muthusrinivasan discloses that the patterns used to identify PII in sub-sequences include: social security numbers, credit card numbers, "passports, government records, bank accounts, and the like." Muthusrinivasan, 6:34-58.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of identifying and removing PII from text of Medalion-Rose to incorporate sliding window techniques taught by Muthusrinivasan. One of ordinary skill in the art would be motivated to integrate sliding window techniques into Medalion-Rose, with a reasonable expectation of success, in order to increase accuracy of determining PII by using regular expressions to match strings of text, such as 

Claim 14
	Rose discloses determining whether a second sub-sequence of words within the sequence of words includes sensitive data based on second statistics information associated with the second sub-sequence of words within the word corpus. The classification module "may utilize a term frequency-inverse document frequency (TF-IDF) algorithm to identify words or phrase patterns indicating sensitive information; wherein the TF-IDF algorithm "determined the frequency of occurrence of terms occur within a single document and also the frequency of occurrence of the terms within all the other documents." Rose, ¶¶ 25; 36 (wherein term frequency is a statistic associated with words or phrase patterns).
	Muthusrinivasan discloses in response to determining that the second sub-sequence of words does not include sensitive data, retaining the second sub-sequence of words in the text data. The system 120 determines whether a sub-portion of the parsed data is a numerical term of length greater than a threshold and comprises consecutively occurring numbers (504-508). Numerical terms meeting these requirements are predicted PII and compared to known PII patterns (510). If the PII system determines that the number matches a known PII pattern (and is not test data) then "the PII system 120 determines that the information is PII type information" (514). Conversely, if "the parsed information does not constitute a pattern match, the PII system determines that the information is not PII type information" (506). Muthusrinivasan, 8:32-9:15; FIG. 5.


Claim 17
	Rose discloses determining whether a second sub-sequence of words within the sequence of words includes sensitive data based on second statistics information associated with the second sub-sequence of words within the word corpus. The classification module "may utilize a term frequency-inverse document frequency (TF-IDF) algorithm to identify words or phrase patterns indicating sensitive information; wherein the TF-IDF algorithm "determined the frequency of occurrence of terms occur within a single document and also the frequency of occurrence of the terms within all the other documents." Rose, ¶¶ 25; 36 (wherein term frequency is a statistic associated with words or phrase patterns).
	Muthusrinivasan discloses in response to determining that the second sub-sequence of words does not include sensitive data, retaining the second sub-sequence of words in the text data. The system 120 determines whether a sub-portion of the parsed data is a numerical term of length greater than a threshold and comprises consecutively occurring numbers (504-508). Numerical terms meeting these requirements are predicted PII and compared to known PII 
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of identifying and removing PII from text of Medalion-Rose to incorporate sliding window techniques taught by Muthusrinivasan. One of ordinary skill in the art would be motivated to integrate sliding window techniques into Medalion-Rose, with a reasonable expectation of success, in order to increase accuracy of determining PII by using regular expressions to match strings of text, such as particular characters, words, or patterns of characters that corresponds to PII type information. See Muthusrinivasan, 6:51-63.

Claim 18
	Rose discloses wherein the operations further comprise: determining that the second statistics information associated with the second sub-sequence of words exceeds a threshold; and wherein the second sub-sequence of words is determined to not include sensitive data based on the determining that the second statistics information exceeds the threshold. The classification module implements a "configurable threshold based on, for example a number of percentage of matching words or phrases or both." The classification module may "assign a confidence score" indicating "a degree of confidence that the tagged communication includes the type of sensitive information indicated by the tag." Id. at ¶¶ 28-31. The method will block or permit sending of Id. at ¶ 33 (wherein the sending is blocked because the communication has identified sensitive information). Rose discloses that the method "may block the electronic transmission from being transmitted" based on a "confidence score" exceeding "a specified threshold." Id. at ¶ 45. One of ordinary skill in the art recognizes that the inverse of the confidence score could be used to block communications scoring below (i.e. not exceeding) the specified threshold (i.e. obvious mathematical variation). This would allow communications scoring above the specified threshold (i.e. communications not including sensitive data).

Claim 19
	Muthusrinivasan discloses wherein the first sub-sequence of words comprises a particular number of words, wherein the operations further comprise: identifying a plurality of word sequences within the word index, wherein each word sequence in the plurality of word sequences includes the particular number of words. Muthusrinivasan discloses "systems and methods for detecting certain types of personally identifiable information included in resource content." Muthusrinivasan, 1:65-67; See Also 12:10-35 (method implemented using computer hardware and memory). The system implements a parser 302 to access and parse a resource "for inspection by the PII and secondary content detector 304." In one embodiment, "the parser uses a sliding window to parse the content of the resource;" wherein the parser "can user a sliding window of up to n characters" or "use a sliding window of up to m words." Parser 392 "uses the sliding window by initially selecting the first n characters (or m words) and then processing the data to detect PII type information. Each adjustment of the sliding window moves the sliding window by Id. at 7:5-28.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of identifying and removing PII from text of Medalion-Rose to incorporate sliding window techniques taught by Muthusrinivasan. One of ordinary skill in the art would be motivated to integrate sliding window techniques into Medalion-Rose, with a reasonable expectation of success, in order to increase accuracy of determining PII by using regular expressions to match strings of text, such as particular characters, words, or patterns of characters that corresponds to PII type information. See Muthusrinivasan, 6:51-63.
	Rose discloses determining statistical data associated with the plurality of word sequences within the word index; and determining the threshold based on the statistical data. The classification module implements a "configurable threshold based on, for example a number of percentage of matching words or phrases or both." The classification module may "assign a confidence score" indicating "a degree of confidence that the tagged communication includes the type of sensitive information indicated by the tag." Id. at ¶¶ 28-31. The method will block or permit sending of tagged communications "based on … either the threshold of matching words and phrases or the conference score or both." Id. at ¶ 33 (wherein the sending is blocked because the communication has identified sensitive information).

Claim 20
	Muthusrinivasan discloses wherein the first sub- sequence of words includes at least one of an address, a funding account number, a gender, a name, or an age. Muthusrinivasan discloses 
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of identifying and removing PII from text of Medalion-Rose to incorporate sliding window techniques taught by Muthusrinivasan. One of ordinary skill in the art would be motivated to integrate sliding window techniques into Medalion-Rose, with a reasonable expectation of success, in order to increase accuracy of determining PII by using regular expressions to match strings of text, such as particular characters, words, or patterns of characters that corresponds to PII type information. See Muthusrinivasan, 6:51-63.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FRANK D MILLS whose telephone number is (571)270-3172. The examiner can normally be reached M-F 10-6 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/FRANK D MILLS/Primary Examiner, Art Unit 2176                                                                                                                                                                                                        March 17, 2022