DETAILED ACTION
This action is in response to the reply received 6/21/2022. After consideration of applicant's amendments and/or remarks:
Claims 6 and 8 objected to for minor informalities.
Claims 1-3 and 5-20 are rejected under 35 USC § 103.
Claim 4 objected to as an allowable dependent claim.


Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Objections
Claims 6 and 8 objected to because of the following informalities:
Claim 6 was amended to depend on claim 1. Accordingly, there is no antecedent basis for "the sliding window." Also, it is unclear why there is a "third sub-sequence of words," but not a "second sub-sequence of words." Because of these informalities and for purposes of examination, Examiner assumes Claim 6 should actually be dependent on Claim 4. 
In claim 8, there is no antecedent basis for "the threshold."
Appropriate correction is required.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 7-10, 13, 15-16, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Medalion et al., U.S. PG-Publication No. 2021/0125615 A1, in view of Rose et al., U.S. PG-Publication No. 2021/0226953 A1, further in view of Duffy et al., U.S. PG-Publication No. 2018/0198602 A1.

Claim 1
	Medalion discloses a system, comprising: a non-transitory memory; and one or more hardware processors coupled with the non-transitory memory and configured to read instructions from the non-transitory memory to cause the system to perform operations. Medalion discloses a "method for detecting personally identifiable information" (PII) from a plurality of text strings; the method is implemented using computer hardware and memory. Medalion, ¶¶ 7-9.
Medalion discloses operations comprising: receiving a request to remove sensitive data from text data comprising a sequence of words. Medalion illustrates "example method 400 for detecting personally identifiable information." At 402, the method receives "a plurality of text strings." At 404, the text strings are provided "to a bidirectional long short-term memory (BiLSTM) neural network model." At 406-408, the BiLSTM model predicts text data elements (in the plurality of text strings) that comprise PII; and redacts the predicted PII elements "from the plurality of text strings to form redacted text strings." Id. at ¶¶ 96-111. Accordingly, sending the text strings (step 404) to a neural network for redacting PII is a request to remove PII from a plurality of text strings.
Medalion discloses matching a first sub-sequence of words within the sequence of words with a first n-gram. Medalion discloses determining whether a text data element comprises PII "based on a forward context and a backward context associated with the respective text data elements." Id. at ¶¶ 100-101. The BiLSTM is "configured to predict (1) whether a string of text … contains PII by looking at the full sentence; (2) whether a specific term (or token) contains PII based on the context of the term; and (3) whether the specific term contains PII based on the term itself." Id. at ¶ 36 (sentence, term context, and term itself are all examples of sub-sequence of words). Further, the BiLSTM is trained "on labeled datasets including PII to identify the PII both directly and by context." Id. at ¶ 34. Training data collection 126 "comprises data for training machine learning model(s) 108" and "may include text-based transcripts of support sessions with users … which have been labeled based on the presence of PII." Id. at ¶ 65 (training data 126 is analogous to claimed word corpus).
Medalion discloses in response to determining that the first sub-sequence of words includes sensitive data, removing at least a portion of the first sub-sequence of words from the text data. At 408, the method "redact[s] one or more text data elements comprising the predicted [PII] from the plurality of text strings to form redacted text strings." Id. at ¶ 103. Redaction involves "deleting the predicted PII text." Id. at ¶ 61.
Medalion does not expressly disclose generating a negative word index based on a word corpus comprising non-sensitive data … wherein each word sequence is associated with a frequency value representing a number of time that the word sequence appears in the word corpus; matching a first sub-sequence of words within the sequence of words with a first n-gram in the negative word index; and determining whether the first sub-sequence of words includes sensitive data based on comparing a first frequency value associated with the first n-gram  against a threshold.
Rose discloses generating a negative word index based on a word corpus comprising non-sensitive data … wherein each word sequence is associated with a frequency value representing a number of time that he word sequence appears in the word corpus. Rose discloses a system for "identifying sensitive information in communications." Rose, ¶ 10. In one embodiment, artificial intelligence is "used to identify specified or sensitive event-related words and phrases in … electronic communications and publicly accessible data that can subsequently be used to classify and tag current electronic communications." Id. at ¶ 12. The system comprises "a classification module 130 … configured to identify sensitive event-related words and phrases in an electronic communication." Id. at ¶ 18. The machine-learning algorithm is trained with "public data containing emails and communications" Id. at ¶ 22 (wherein the emails and communications are a word corpus comprising non-sensitive data). The classification module "may utilize a term frequency-inverse document frequency (TF-IDF) algorithm to identify words or phrase patterns indicating sensitive information; wherein the TF-IDF algorithm "determined the frequency of occurrence of terms occur within a single document and also the frequency of occurrence of the terms within all the other documents." Id. at ¶¶ 25; 36 (wherein term frequency is a frequency value associated with words or phrase patterns in a corpus).
Rose discloses determining whether the first sub-sequence of words includes sensitive data based on comparing a first frequency value associated with the first n-gram against a threshold. The classification module implements a "configurable threshold based on, for example a number of percentage of matching words or phrases or both." The classification module may "assign a confidence score" indicating "a degree of confidence that the tagged communication includes the type of sensitive information indicated by the tag." Id. at ¶¶ 28-31. The method will block or permit sending of tagged communications "based on … either the threshold of matching words and phrases or the conference score or both." Id. at ¶ 33 (wherein the sending is blocked because the communication has identified sensitive information). Rose discloses that the method "may block the electronic transmission from being transmitted" based on a "confidence score" exceeding "a specified threshold." Id. at ¶ 45. One of ordinary skill in the art recognizes that the inverse of the confidence score could be used to block communications scoring below (i.e. not exceeding) the specified threshold (i.e. obvious mathematical variation). This would allow communications scoring above the specified threshold (i.e. communications not including sensitive data).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of redacting sensitive words or phrases in communications using machine learning of Medalion to incorporate an AI to identify sensitive words or phrases trained using public data and implementing term frequency as taught by Rose. One of ordinary skill in the art would be motivated to integrate public data training and term frequency into Medalion, with a reasonable expectation of success, in order to "dynamically update and apply controls to sensitive information that may not have been sensitive historically but is sensitive in a current time." Rose, ¶ 10.
Medalion-Rose does not expressly disclose wherein the negative word index comprises a plurality of groups of n-grams, wherein each group in the plurality of groups of n-grams comprises word sequences having a particular word length.
Kothuvatiparambil discloses wherein the negative word index comprises a plurality of groups of n-grams, wherein each group in the plurality of groups of n-grams comprises word sequences having a particular word length. Kothuvatiparambil discloses a system "for extracting verifiable entities" from text (e.g. a user-utterance), wherein valid entities are "system-defined entities." Kothuvatiparambil, ¶ 30. The system determines "the number of tokens" in the input text and generate n-gram sequences of different lengths (e.g. unigram, bigram trigram, quadgram) from the input. Id. at ¶¶ 32-33; See Also FIG. 3B. The system uses "a sliding-window protocol," wherein each "n-gram sequence may include a window-size equal to a value of n in the n-gram sequence." Id. at ¶ 35. Determining a valid entity is performed by determining if an "n-gram is a valid match to the stored entity-verifier that may be determined to be associated with the n-gram," from a "list of stored entity-verifiers in [a] database" (i.e. word index). Id. at ¶ 42. Figure 1 illustrates a flowchart of a method implemented using the disclosed system. At 104, the system generates n-gram sequences of various lengths. At 106-110, the system runs parallel threads for "each n-gram sequence generated" that reviews the n-gram for eligibility. Id. at ¶ 50. Kothuvatiparambil discloses a system wherein n-grams comprised of word sequences having particular word lengths (e.g. unigram, bigram, trigram, quadgram, etc.) are matched with entity-verifiers associated with n-grams. Accordingly, the stored entity-verifiers used to match n-grams of various lengths are comprise groups of n-grams having particular word lengths.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of matching word sequences to determine sensitive information of Medalion-Rose to incorporate matching n-gram word sequences of various lengths as taught by Kothuvatiparambil. One of ordinary skill in the art would be motivated to integrate matching n-gram word sequences of various lengths into Medalion-Rose, with a reasonable expectation of success, in order to increase the accuracy of extracting particular entities from text. See Kothuvatiparambil, ¶ 7.

Claim 2
	Medalion discloses wherein the operations further comprise storing the text data in a data storage after removing at least a portion of the first sub-sequence of words from the text data. At 410, the redacted text strings are provided to a data repository. Medalion, ¶ 104. Specifically, the method generates redacted text strings, "which are sent back to production data collection 122 to replace the un-redacted text strings that include PII." Id. at ¶ 91.

Claim 7
	Rose discloses determining that the first frequency value is below the threshold based on the comparing; and  determining that the first sub-sequence of words includes sensitive data based on the determining that the first frequency value is below the threshold. The classification module "may utilize a term frequency-inverse document frequency (TF-IDF) algorithm to identify words or phrase patterns indicating sensitive information; wherein the TF-IDF algorithm "determined the frequency of occurrence of terms occur within a single document and also the frequency of occurrence of the terms within all the other documents." Rose, ¶¶ 25; 36 (wherein term frequency is a statistic associated with words or phrase patterns). In one embodiment, the program "may identify a tag indicating that the election communication may contain sensitive information, but the confidence score may be below a specified threshold." Id. at ¶ 44.

Claim 8
Medalion discloses a method, comprising: receiving, by one or more hardware processors, a request to remove sensitive data from text data comprising a sequence of words. Medalion discloses a "method for detecting personally identifiable information" (PII) from a plurality of text strings; the method is implemented using computer hardware and memory. Medalion, ¶¶ 7-9. Medalion illustrates "example method 400 for detecting personally identifiable information." At 402, the method receives "a plurality of text strings." At 404, the text strings are provided "to a bidirectional long short-term memory (BiLSTM) neural network model." At 406-408, the BiLSTM model predicts text data elements (in the plurality of text strings) that comprise PII; and redacts the predicted PII elements "from the plurality of text strings to form redacted text strings." Id. at ¶¶ 96-111. Accordingly, sending the text strings (step 404) to a neural network for redacting PII is a request to remove PII from a plurality of text strings.
Medalion discloses identifying, by the one or more hardware processors, a first sub-sequence of words from the sequence of words. Medalion discloses determining whether a text data element comprises PII "based on a forward context and a backward context associated with the respective text data elements." Id. at ¶¶ 100-101. The BiLSTM is "configured to predict (1) whether a string of text … contains PII by looking at the full sentence; (2) whether a specific term (or token) contains PII based on the context of the term; and (3) whether the specific term contains PII based on the term itself." Id. at ¶ 36 (sentence, term context, and term itself are all examples of sub-sequence of words). Further, the BiLSTM is trained "on labeled datasets including PII to identify the PII both directly and by context." Id. at ¶ 34. Training data collection 126 "comprises data for training machine learning model(s) 108" and "may include text-based transcripts of support sessions with users … which have been labeled based on the presence of PII." Id. at ¶ 65 (training data 126 is analogous to claimed word corpus).
Medalion discloses determining, by the one or more hardware processors, that the first sub-sequence of words includes sensitive data. Medalion discloses determining whether a text data element comprises PII "based on a forward context and a backward context associated with the respective text data elements." Id. at ¶¶ 100-101. The BiLSTM is "configured to predict (1) whether a string of text … contains PII by looking at the full sentence; (2) whether a specific term (or token) contains PII based on the context of the term; and (3) whether the specific term contains PII based on the term itself." Id. at ¶ 36 (sentence, term context, and term itself are all examples of sub-sequence of words). Further, the BiLSTM is trained "on labeled datasets including PII to identify the PII both directly and by context." Id. at ¶ 34. Training data collection 126 "comprises data for training machine learning model(s) 108" and "may include text-based transcripts of support sessions with users … which have been labeled based on the presence of PII." Id. at ¶ 65 (training data 126 is analogous to claimed word corpus).
	Medalion discloses in response to determining that the first sub-sequence of words includes sensitive data, removing, by the one or more hardware processors, at least a portion of the first sub-sequence of words from the text data. At 408, the method "redact[s] one or more text data elements comprising the predicted [PII] from the plurality of text strings to form redacted text strings." Id. at ¶ 103. Redaction involves "deleting the predicted PII text." Id. at ¶ 61.
	Medalion does not expressly disclose accessing a negative word index generated based on a word corpus comprising non-sensitive data … wherein each word sequence in the negative word index is associated with a value representing a statistical characteristic associated with the word sequence appearing in the word corpus; determining, by the one or more hardware processors, that the first sub-sequence of words includes sensitive data based at least in part on a failure to match the first sub-sequence of words with any word sequence in the negative word index or a first value associated with a first word sequence in the negative word index that corresponds to the first sub-sequence of words.
	Rose discloses accessing a negative word index generated based on a word corpus comprising non-sensitive data … wherein each word sequence in the negative word index is associated with a value representing a statistical characteristic associated with the word sequence appearing in the word corpus. Rose discloses a system for "identifying sensitive information in communications." Rose, ¶ 10. In one embodiment, artificial intelligence is "used to identify specified or sensitive event-related words and phrases in … electronic communications and publicly accessible data that can subsequently be used to classify and tag current electronic communications." Id. at ¶ 12. The system comprises "a classification module 130 … configured to identify sensitive event-related words and phrases in an electronic communication." Id. at ¶ 18. The machine-learning algorithm is trained with "public data containing emails and communications" Id. at ¶ 22 (wherein the emails and communications are a word corpus comprising non-sensitive data). The classification module "may utilize a term frequency-inverse document frequency (TF-IDF) algorithm to identify words or phrase patterns indicating sensitive information; wherein the TF-IDF algorithm "determined the frequency of occurrence of terms occur within a single document and also the frequency of occurrence of the terms within all the other documents." Id. at ¶¶ 25; 36 (wherein term frequency is a frequency value associated with words or phrase patterns in a corpus).
Rose discloses determining, by the one or more hardware processors, that the first sub-sequence of words includes sensitive data based at least in part on a failure to match the first sub-sequence of words with any word sequence in the negative word index or a first value associated with a first word sequence in the negative word index that corresponds to the first sub-sequence of words. The classification module implements a "configurable threshold based on, for example a number of percentage of matching words or phrases or both." The classification module may "assign a confidence score" indicating "a degree of confidence that the tagged communication includes the type of sensitive information indicated by the tag." Id. at ¶¶ 28-31. The method will block or permit sending of tagged communications "based on … either the threshold of matching words and phrases or the conference score or both." Id. at ¶ 33 (wherein the sending is blocked because the communication has identified sensitive information). Rose discloses that the method "may block the electronic transmission from being transmitted" based on a "confidence score" exceeding "a specified threshold." Id. at ¶ 45. Further, the classification module "may utilize a term frequency-inverse document frequency (TF-IDF) algorithm to identify words or phrase patterns indicating sensitive information; wherein the TF-IDF algorithm "determined the frequency of occurrence of terms occur within a single document and also the frequency of occurrence of the terms within all the other documents." Rose, ¶¶ 25; 36 (wherein term frequency is a "first value" associated with words or phrase patterns).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of redacting sensitive words or phrases in communications using machine learning of Medalion to incorporate an AI to identify sensitive words or phrases trained using public data and implementing term frequency as taught by Rose. One of ordinary skill in the art would be motivated to integrate public data training and term frequency into Medalion, with a reasonable expectation of success, in order to "dynamically update and apply controls to sensitive information that may not have been sensitive historically but is sensitive in a current time." Rose, ¶ 10.
Medalion-Rose does not expressly disclose wherein the negative word index comprises a plurality of word sequences having different word lengths.
Kothuvatiparambil discloses wherein the negative word index comprises a plurality of word sequences having different word lengths. Kothuvatiparambil discloses a system "for extracting verifiable entities" from text (e.g. a user-utterance), wherein valid entities are "system-defined entities." Kothuvatiparambil, ¶ 30. The system determines "the number of tokens" in the input text and generate n-gram sequences of different lengths (e.g. unigram, bigram trigram, quadgram) from the input. Id. at ¶¶ 32-33; See Also FIG. 3B. The system uses "a sliding-window protocol," wherein each "n-gram sequence may include a window-size equal to a value of n in the n-gram sequence." Id. at ¶ 35. Determining a valid entity is performed by determining if an "n-gram is a valid match to the stored entity-verifier that may be determined to be associated with the n-gram," from a "list of stored entity-verifiers in [a] database" (i.e. word index). Id. at ¶ 42. Figure 1 illustrates a flowchart of a method implemented using the disclosed system. At 104, the system generates n-gram sequences of various lengths. At 106-110, the system runs parallel threads for "each n-gram sequence generated" that reviews the n-gram for eligibility. Id. at ¶ 50. Kothuvatiparambil discloses a system wherein n-grams comprised of word sequences having particular word lengths (e.g. unigram, bigram, trigram, quadgram, etc.) are matched with entity-verifiers associated with n-grams. Accordingly, the stored entity-verifiers used to match n-grams of various lengths are comprise groups of n-grams having particular word lengths.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of matching word sequences to determine sensitive information of Medalion-Rose to incorporate matching n-gram word sequences of various lengths as taught by Kothuvatiparambil. One of ordinary skill in the art would be motivated to integrate matching n-gram word sequences of various lengths into Medalion-Rose, with a reasonable expectation of success, in order to increase the accuracy of extracting particular entities from text. See Kothuvatiparambil, ¶ 7.

Claim 9
	Rose discloses wherein the first value indicates a number of different documents in which the first word sequence appears in the word corpus. The classification module "may utilize a term frequency-inverse document frequency (TF-IDF) algorithm to identify words or phrase patterns indicating sensitive information; wherein the TF-IDF algorithm "determined the frequency of occurrence of terms occur within a single document and also the frequency of occurrence of the terms within all the other documents." Rose, ¶¶ 25; 36 (wherein term frequency is a statistic associated with words or phrase patterns).

Claim 10
Kothuvatiparambil discloses wherein the first sub-sequence of words comprises a number of words has a particular word length, wherein the method further comprises: identifying a set of word sequences in the negative word index based on the particular word length, wherein each word sequence in the set of word sequences has the particular word length. Kothuvatiparambil discloses a system "for extracting verifiable entities" from text (e.g. a user-utterance), wherein valid entities are "system-defined entities." Kothuvatiparambil, ¶ 30. The system determines "the number of tokens" in the input text and generate n-gram sequences of different lengths (e.g. unigram, bigram trigram, quadgram) from the input. Id. at ¶¶ 32-33; See Also FIG. 3B. The system uses "a sliding-window protocol," wherein each "n-gram sequence may include a window-size equal to a value of n in the n-gram sequence." Id. at ¶ 35. Determining a valid entity is performed by determining if an "n-gram is a valid match to the stored entity-verifier that may be determined to be associated with the n-gram," from a "list of stored entity-verifiers in [a] database" (i.e. word index). Id. at ¶ 42. Figure 1 illustrates a flowchart of a method implemented using the disclosed system. At 104, the system generates n-gram sequences of various lengths. At 106-110, the system runs parallel threads for "each n-gram sequence generated" that reviews the n-gram for eligibility. Id. at ¶ 50. Kothuvatiparambil discloses a system wherein n-grams comprised of word sequences having particular word lengths (e.g. unigram, bigram, trigram, quadgram, etc.) are matched with entity-verifiers associated with n-grams. Accordingly, the stored entity-verifiers used to match n-grams of various lengths are comprise groups of n-grams having particular word lengths.
	Rose discloses determining statistical data associated with the set of word sequences appearing in the word corpus; and determining the threshold based on the statistical data. The classification module implements a "configurable threshold based on, for example a number of percentage of matching words or phrases or both." The classification module may "assign a confidence score" indicating "a degree of confidence that the tagged communication includes the type of sensitive information indicated by the tag." Id. at ¶¶ 28-31. The method will block or permit sending of tagged communications "based on … either the threshold of matching words and phrases or the conference score or both." Id. at ¶ 33 (wherein the sending is blocked because the communication has identified sensitive information).

Claim 13
	Medalion discloses wherein the text data comprises a transcript of a communication with a user. Medalion discloses that he method removes PII from text transcriptions of "support phone calls" or a "text-based support session, such as a live chat." Medalion, ¶¶ 82-85.

Claim 15
	Medalion discloses a non-transitory machine-readable medium having stored thereon machine- readable instructions executable to cause a machine to perform operations. Medalion discloses a "method for detecting personally identifiable information" (PII) from a plurality of text strings; the method is implemented using computer hardware and memory. Medalion, ¶¶ 7-9.
	Medalion discloses operations comprising: receiving a request to remove sensitive data from text data comprising a sequence of words. Medalion illustrates "example method 400 for detecting personally identifiable information." At 402, the method receives "a plurality of text strings." At 404, the text strings are provided "to a bidirectional long short-term memory (BiLSTM) neural network model." At 406-408, the BiLSTM model predicts text data elements (in the plurality of text strings) that comprise PII; and redacts the predicted PII elements "from the plurality of text strings to form redacted text strings." Id. at ¶¶ 96-111. Accordingly, sending the text strings (step 404) to a neural network for redacting PII is a request to remove PII from a plurality of text strings.
Medalion discloses accessing a word index generated based on a word corpus comprising non-sensitive words; and determining whether [a] first sub-sequence of words within the sequence of words includes sensitive data. Medalion discloses determining whether a text data element comprises PII "based on a forward context and a backward context associated with the respective text data elements." Id. at ¶¶ 100-101. The BiLSTM is "configured to predict (1) whether a string of text … contains PII by looking at the full sentence; (2) whether a specific term (or token) contains PII based on the context of the term; and (3) whether the specific term contains PII based on the term itself." Id. at ¶ 36 (sentence, term context, and term itself are all examples of sub-sequence of words). Further, the BiLSTM is trained "on labeled datasets including PII to identify the PII both directly and by context." Id. at ¶ 34. Training data collection 126 "comprises data for training machine learning model(s) 108" and "may include text-based transcripts of support sessions with users … which have been labeled based on the presence of PII." Id. at ¶ 65 (training data 126 is analogous to claimed word index).
Medalion discloses matching a first sub-sequence of words from the sequence of words with a first word sequence in the negative word index. Medalion discloses determining whether a text data element comprises PII "based on a forward context and a backward context associated with the respective text data elements." Id. at ¶¶ 100-101. The BiLSTM is "configured to predict (1) whether a string of text … contains PII by looking at the full sentence; (2) whether a specific term (or token) contains PII based on the context of the term; and (3) whether the specific term contains PII based on the term itself." Id. at ¶ 36 (sentence, term context, and term itself are all examples of sub-sequence of words). Further, the BiLSTM is trained "on labeled datasets including PII to identify the PII both directly and by context." Id. at ¶ 34. Training data collection 126 "comprises data for training machine learning model(s) 108" and "may include text-based transcripts of support sessions with users … which have been labeled based on the presence of PII." Id. at ¶ 65 (training data 126 is analogous to claimed word corpus).
	Medalion discloses in response to determining that the first sub-sequence of words includes sensitive data, removing at least a portion of the first sub-sequence of words from the text data. At 408, the method "redact[s] one or more text data elements comprising the predicted [PII] from the plurality of text strings to form redacted text strings." Id. at ¶ 103. Redaction involves "deleting the predicted PII text." Id. at ¶ 61.
	Medalion does not expressly disclose determining whether a first sub-sequence of words within the sequence of words includes sensitive data based on statistics information associated with the first sub-sequence of words within the word index.
Rose discloses wherein each word sequence in the negative word index is associated with a value representing a statistical characteristic of the word sequence appearing in the word corpus. Rose discloses a system for "identifying sensitive information in communications." Rose, ¶ 10. In one embodiment, artificial intelligence is "used to identify specified or sensitive event-related words and phrases in … electronic communications and publicly accessible data that can subsequently be used to classify and tag current electronic communications." Id. at ¶ 12. The system comprises "a classification module 130 … configured to identify sensitive event-related words and phrases in an electronic communication." Id. at ¶ 18. The machine-learning algorithm is trained with "public data containing emails and communications" Id. at ¶ 22 (wherein the emails and communications are a word corpus comprising non-sensitive data). The classification module "may utilize a term frequency-inverse document frequency (TF-IDF) algorithm to identify words or phrase patterns indicating sensitive information; wherein the TF-IDF algorithm "determined the frequency of occurrence of terms occur within a single document and also the frequency of occurrence of the terms within all the other documents." Id. at ¶¶ 25; 36 (wherein term frequency is a frequency value associated with words or phrase patterns in a corpus).
Rose discloses determining whether the first sub sequence of words includes sensitive data based on comparing a first value associated with the first word sequence against a threshold. The classification module implements a "configurable threshold based on, for example a number of percentage of matching words or phrases or both." The classification module may "assign a confidence score" indicating "a degree of confidence that the tagged communication includes the type of sensitive information indicated by the tag." Id. at ¶¶ 28-31. The method will block or permit sending of tagged communications "based on … either the threshold of matching words and phrases or the conference score or both." Id. at ¶ 33 (wherein the sending is blocked because the communication has identified sensitive information). Rose discloses that the method "may block the electronic transmission from being transmitted" based on a "confidence score" exceeding "a specified threshold." Id. at ¶ 45. One of ordinary skill in the art recognizes that the inverse of the confidence score could be used to block communications scoring below (i.e. not exceeding) the specified threshold (i.e. obvious mathematical variation). This would allow communications scoring above the specified threshold (i.e. communications not including sensitive data).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of redacting sensitive words or phrases in communications using machine learning of Medalion to incorporate an AI to identify sensitive words or phrases trained using public data and implementing term frequency as taught by Rose. One of ordinary skill in the art would be motivated to integrate public data training and term frequency into Medalion, with a reasonable expectation of success, in order to "dynamically update and apply controls to sensitive information that may not have been sensitive historically but is sensitive in a current time." Rose, ¶ 10.
Medalion-Rose does not expressly disclose wherein the negative word index comprises a plurality of word sequences having different word lengths.
Kothuvatiparambil discloses wherein the negative word index comprises a plurality of word sequences having different word lengths. Kothuvatiparambil discloses a system "for extracting verifiable entities" from text (e.g. a user-utterance), wherein valid entities are "system-defined entities." Kothuvatiparambil, ¶ 30. The system determines "the number of tokens" in the input text and generate n-gram sequences of different lengths (e.g. unigram, bigram trigram, quadgram) from the input. Id. at ¶¶ 32-33; See Also FIG. 3B. The system uses "a sliding-window protocol," wherein each "n-gram sequence may include a window-size equal to a value of n in the n-gram sequence." Id. at ¶ 35. Determining a valid entity is performed by determining if an "n-gram is a valid match to the stored entity-verifier that may be determined to be associated with the n-gram," from a "list of stored entity-verifiers in [a] database" (i.e. word index). Id. at ¶ 42. Figure 1 illustrates a flowchart of a method implemented using the disclosed system. At 104, the system generates n-gram sequences of various lengths. At 106-110, the system runs parallel threads for "each n-gram sequence generated" that reviews the n-gram for eligibility. Id. at ¶ 50. Kothuvatiparambil discloses a system wherein n-grams comprised of word sequences having particular word lengths (e.g. unigram, bigram, trigram, quadgram, etc.) are matched with entity-verifiers associated with n-grams. Accordingly, the stored entity-verifiers used to match n-grams of various lengths are comprise groups of n-grams having particular word lengths.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of matching word sequences to determine sensitive information of Medalion-Rose to incorporate matching n-gram word sequences of various lengths as taught by Kothuvatiparambil. One of ordinary skill in the art would be motivated to integrate matching n-gram word sequences of various lengths into Medalion-Rose, with a reasonable expectation of success, in order to increase the accuracy of extracting particular entities from text. See Kothuvatiparambil, ¶ 7.

Claim 16
	Medalion discloses wherein the operations further comprise training a machine learning model based on the text data after removing at least a portion of the first sub-sequence of words. Medalion discloses that "the redacted text strings may be provided for other uses 308, such as training, generation of other models, further analysis, and the like." Medalion, ¶ 91.

Claim 19
Kothuvatiparambil discloses wherein the first sub-sequence of words has a particular word length, wherein the operations further comprise: identifying, from the plurality of word sequences within the negative word index, a set of word sequences based on the particular word length, wherein each word sequence in the set of word sequences has the particular word length. Kothuvatiparambil discloses a system "for extracting verifiable entities" from text (e.g. a user-utterance), wherein valid entities are "system-defined entities." Kothuvatiparambil, ¶ 30. The system determines "the number of tokens" in the input text and generate n-gram sequences of different lengths (e.g. unigram, bigram trigram, quadgram) from the input. Id. at ¶¶ 32-33; See Also FIG. 3B. The system uses "a sliding-window protocol," wherein each "n-gram sequence may include a window-size equal to a value of n in the n-gram sequence." Id. at ¶ 35. Determining a valid entity is performed by determining if an "n-gram is a valid match to the stored entity-verifier that may be determined to be associated with the n-gram," from a "list of stored entity-verifiers in [a] database" (i.e. word index). Id. at ¶ 42. Figure 1 illustrates a flowchart of a method implemented using the disclosed system. At 104, the system generates n-gram sequences of various lengths. At 106-110, the system runs parallel threads for "each n-gram sequence generated" that reviews the n-gram for eligibility. Id. at ¶ 50. Kothuvatiparambil discloses a system wherein n-grams comprised of word sequences having particular word lengths (e.g. unigram, bigram, trigram, quadgram, etc.) are matched with entity-verifiers associated with n-grams. Accordingly, the stored entity-verifiers used to match n-grams of various lengths are comprise groups of n-grams having particular word lengths.
	Rose discloses determining statistical data associated with the set of word sequences within the negative word index; and determining the threshold based on the statistical data. The classification module implements a "configurable threshold based on, for example a number of percentage of matching words or phrases or both." The classification module may "assign a confidence score" indicating "a degree of confidence that the tagged communication includes the type of sensitive information indicated by the tag." Id. at ¶¶ 28-31. The method will block or permit sending of tagged communications "based on … either the threshold of matching words and phrases or the conference score or both." Id. at ¶ 33 (wherein the sending is blocked because the communication has identified sensitive information).


Claims 3, 5-6, 11-12, 14, 17-18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Medalion, in view of Rose, further in view of Kothuvatiparambil, further in view of Muthusrinivasan et al., U.S. Patent No. 8,561,185 B1.

Claim 3
	Muthusrinivasan discloses wherein the operations further comprise: providing a sliding window on the text data, wherein the first sub-sequence of words is identified from the sequence of words based on the sliding window. Muthusrinivasan discloses "systems and methods for detecting certain types of personally identifiable information included in resource content." Muthusrinivasan, 1:65-67; See Also 12:10-35 (method implemented using computer hardware and memory). The system implements a parser 302 to access and parse a resource "for inspection by the PII and secondary content detector 304." In one embodiment, "the parser uses a sliding window to parse the content of the resource;" wherein the parser "can user a sliding window of up to n characters" or "use a sliding window of up to m words." Parser 392 "uses the sliding window by initially selecting the first n characters (or m words) and then processing the data to detect PII type information. Each adjustment of the sliding window moves the sliding window by deleting one of the selected characters (or words) and adding a next character (or word) in the sequence of characters (or words) of the content." Id. at 7:5-28.
	Muthusrinivasan discloses in response to determining that the first sub-sequence of words includes sensitive data, reducing a size of the sliding window to select a second sub-sequence of words consisting of the portion of the first sub-sequence of words. The system 120 determines whether a sub-portion of the parsed data is a numerical term of length greater than a threshold and comprises consecutively occurring numbers (504-508). Numerical terms meeting these requirements are predicted PII and compared to known PII patterns (510). If the PII system determines that the number matches a known PII pattern (and is not test data) then "the PII system 120 determines that the information is PII type information" (514). Conversely, if "the parsed information does not constitute a pattern match, the PII system determines that the information is not PII type information" (506). Id. at 8:32-9:15; FIG. 5. Accordingly, the numeric term is a smaller sub-sequence of the parsed window (i.e. third sub-sequence of words) that is subject to further PII determination.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of identifying and removing PII from text of Medalion-Rose-Kothuvatiparambil to incorporate sliding window techniques taught by Muthusrinivasan. One of ordinary skill in the art would be motivated to integrate sliding window techniques into Medalion-Rose-Kothuvatiparambil, with a reasonable expectation of success, in order to increase accuracy of determining PII by using regular expressions to match strings of text, such as particular characters, words, or patterns of characters that corresponds to PII type information. See Muthusrinivasan, 6:51-63.

Claim 5
	Muthusrinivasan discloses wherein the size of the sliding window is reduced by excluding a last word from the first sub-sequence of words. The system implements a parser 302 to access and parse a resource "for inspection by the PII and secondary content detector 304." In one embodiment, "the parser uses a sliding window to parse the content of the resource;" wherein the parser "can user a sliding window of up to n characters" or "use a sliding window of up to m words." Parser 392 "uses the sliding window by initially selecting the first n characters (or m words) and then processing the data to detect PII type information. Each adjustment of the sliding window moves the sliding window by deleting one of the selected characters (or words) and adding a next character (or word) in the sequence of characters (or words) of the content." Muthusrinivasan, 7:5-28.

Claim 6
	Muthusrinivasan discloses wherein the operation further comprise: in response to determining that the first sub-sequence of words includes sensitive data, moving the sliding window to select a third sub-sequence of words in the sequence of words; determining whether the third sub-sequence of words within the sequence of words includes sensitive data based on the negative word index; and in response to determining that the third sub-sequence of words does not include sensitive data, retaining the third sub-sequence of words in the text data. The system 120 determines whether a sub-portion of the parsed data is a numerical term of length greater than a threshold and comprises consecutively occurring numbers (504-508). Numerical terms meeting these requirements are predicted PII and compared to known PII patterns (510). If the PII system determines that the number matches a known PII pattern (and is not test data) then "the PII system 120 determines that the information is PII type information" (514). Conversely, if "the parsed information does not constitute a pattern match, the PII system determines that the information is not PII type information" (506). Muthusrinivasan, 8:32-9:15; FIG. 5. Accordingly, the numeric term is a smaller sub-sequence of the parsed window (i.e. third sub-sequence of words) that is subject to further PII determination.

Claim 11
	Muthusrinivasan discloses wherein the first sub-sequence of words includes three or more words. The system implements a parser 302 to access and parse a resource "for inspection by the PII and secondary content detector 304." In one embodiment, "the parser uses a sliding window to parse the content of the resource;" wherein the parser "can user a sliding window of up to n characters" or "use a sliding window of up to m words." Parser 392 "uses the sliding window by initially selecting the first n characters (or m words) and then processing the data to detect PII type information. Each adjustment of the sliding window moves the sliding window by deleting one of the selected characters (or words) and adding a next character (or word) in the sequence of characters (or words) of the content." Muthusrinivasan, 7:5-28. Any m words include three or more words.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of identifying and removing PII from text of Medalion-Rose-Kothuvatiparambil to incorporate sliding window techniques taught by Muthusrinivasan. One of ordinary skill in the art would be motivated to integrate sliding window techniques into Medalion-Rose-Kothuvatiparambil, with a reasonable expectation of success, in order to increase accuracy of determining PII by using regular expressions to match strings of text, such as particular characters, words, or patterns of characters that corresponds to PII type information. See Muthusrinivasan, 6:51-63.

Claim 12
	Muthusrinivasan discloses wherein the first sub-sequence of words includes at least one of an address, a funding account number, a gender, a name, or an age. Muthusrinivasan discloses that the patterns used to identify PII in sub-sequences include: social security numbers, credit card numbers, "passports, government records, bank accounts, and the like." Muthusrinivasan, 6:34-58.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of identifying and removing PII from text of Medalion-Rose-Kothuvatiparambil, to incorporate sliding window techniques taught by Muthusrinivasan. One of ordinary skill in the art would be motivated to integrate sliding window techniques into Medalion-Rose-Kothuvatiparambil, with a reasonable expectation of success, in order to increase accuracy of determining PII by using regular expressions to match strings of text, such as particular characters, words, or patterns of characters that corresponds to PII type information. See Muthusrinivasan, 6:51-63.

Claim 14
	Rose discloses determining whether a second sub-sequence of words within the sequence of words includes sensitive data based on the negative word index. The classification module "may utilize a term frequency-inverse document frequency (TF-IDF) algorithm to identify words or phrase patterns indicating sensitive information; wherein the TF-IDF algorithm "determined the frequency of occurrence of terms occur within a single document and also the frequency of occurrence of the terms within all the other documents." Rose, ¶¶ 25; 36 (wherein term frequency is a statistic associated with words or phrase patterns).
	Muthusrinivasan discloses in response to determining that the second sub-sequence of words does not include sensitive data, retaining the second sub-sequence of words in the text data. The system 120 determines whether a sub-portion of the parsed data is a numerical term of length greater than a threshold and comprises consecutively occurring numbers (504-508). Numerical terms meeting these requirements are predicted PII and compared to known PII patterns (510). If the PII system determines that the number matches a known PII pattern (and is not test data) then "the PII system 120 determines that the information is PII type information" (514). Conversely, if "the parsed information does not constitute a pattern match, the PII system determines that the information is not PII type information" (506). Muthusrinivasan, 8:32-9:15; FIG. 5.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of identifying and removing PII from text of Medalion-Rose-Kothuvatiparambil to incorporate sliding window techniques taught by Muthusrinivasan. One of ordinary skill in the art would be motivated to integrate sliding window techniques into Medalion-Rose-Kothuvatiparambil, with a reasonable expectation of success, in order to increase accuracy of determining PII by using regular expressions to match strings of text, such as particular characters, words, or patterns of characters that corresponds to PII type information. See Muthusrinivasan, 6:51-63.

Claim 17
	Rose discloses determining whether a second sub-sequence of words within the sequence of words includes sensitive data based on the negative word index. The classification module "may utilize a term frequency-inverse document frequency (TF-IDF) algorithm to identify words or phrase patterns indicating sensitive information; wherein the TF-IDF algorithm "determined the frequency of occurrence of terms occur within a single document and also the frequency of occurrence of the terms within all the other documents." Rose, ¶¶ 25; 36 (wherein term frequency is a statistic associated with words or phrase patterns).
	Muthusrinivasan discloses in response to determining that the second sub-sequence of words does not include sensitive data, retaining the second sub-sequence of words in the text data. The system 120 determines whether a sub-portion of the parsed data is a numerical term of length greater than a threshold and comprises consecutively occurring numbers (504-508). Numerical terms meeting these requirements are predicted PII and compared to known PII patterns (510). If the PII system determines that the number matches a known PII pattern (and is not test data) then "the PII system 120 determines that the information is PII type information" (514). Conversely, if "the parsed information does not constitute a pattern match, the PII system determines that the information is not PII type information" (506). Muthusrinivasan, 8:32-9:15; FIG. 5.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of identifying and removing PII from text of Medalion-Rose-Kothuvatiparambil to incorporate sliding window techniques taught by Muthusrinivasan. One of ordinary skill in the art would be motivated to integrate sliding window techniques into Medalion-Rose-Kothuvatiparambil, with a reasonable expectation of success, in order to increase accuracy of determining PII by using regular expressions to match strings of text, such as particular characters, words, or patterns of characters that corresponds to PII type information. See Muthusrinivasan, 6:51-63.

Claim 18
	Rose discloses matching the second sub-sequence of words with a second word sequence in the negative word index; and comparing a second value associated with the second word sequence against the threshold, wherein the second sub-sequence of words is determined to not include sensitive data based on the second value exceeding the threshold. The classification module implements a "configurable threshold based on, for example a number of percentage of matching words or phrases or both." The classification module may "assign a confidence score" indicating "a degree of confidence that the tagged communication includes the type of sensitive information indicated by the tag." Id. at ¶¶ 28-31. The method will block or permit sending of tagged communications "based on … either the threshold of matching words and phrases or the conference score or both." Id. at ¶ 33 (wherein the sending is blocked because the communication has identified sensitive information). Rose discloses that the method "may block the electronic transmission from being transmitted" based on a "confidence score" exceeding "a specified threshold." Id. at ¶ 45. One of ordinary skill in the art recognizes that the inverse of the confidence score could be used to block communications scoring below (i.e. not exceeding) the specified threshold (i.e. obvious mathematical variation). This would allow communications scoring above the specified threshold (i.e. communications not including sensitive data).

Claim 20
	Muthusrinivasan discloses wherein the first sub- sequence of words includes at least one of an address, a funding account number, a gender, a name, or an age. Muthusrinivasan discloses that the patterns used to identify PII in sub-sequences include: social security numbers, credit card numbers, "passports, government records, bank accounts, and the like." Muthusrinivasan, 6:34-58.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of identifying and removing PII from text of Medalion-Rose-Kothuvatiparambil to incorporate sliding window techniques taught by Muthusrinivasan. One of ordinary skill in the art would be motivated to integrate sliding window techniques into Medalion-Rose-Kothuvatiparambil, with a reasonable expectation of success, in order to increase accuracy of determining PII by using regular expressions to match strings of text, such as particular characters, words, or patterns of characters that corresponds to PII type information. See Muthusrinivasan, 6:51-63.


Allowable Subject Matter
Claim 4 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.


Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FRANK D MILLS whose telephone number is (571)270-3172. The examiner can normally be reached M-F 10-6 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KAVITA PADMANABHAN can be reached on (571)272-8352. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/FRANK D MILLS/Primary Examiner, Art Unit 2176                                                                                                                                                                                                        September 22, 2022