DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

The following is a Non-Final Office Action in response to applicant’s filing on July 10, 2020.
Claims 1-20 are pending.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-4, 6-11, 13-18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over EBRAHMI et al. (US 2010/0205189 A1), hereinafter Ebrahimi in view of ARAVAMUDAN et al. (US 2020/0402625 A1), hereinafter Aravamudan.

In regards to claim 1, Ebrahimi discloses a system for data analysis and security via intelligent masking of unstructured data, the system comprising: 
at least one memory device with computer-readable program code stored thereon (Ebrahimi, Para. 0019, include one or more memory devices that may store tables of data); 
at least one communication device (Ebrahimi, Para. 0018, include a communication or computation device, such as a desktop computer, a laptop, a mobile communication device (, e.g., a mobile phone or a personal digital assistant (PDA)), or another type of communication or computation device); 
at least one processing device operatively coupled to the at least one memory device and the at least one communication device, wherein executing the computer-readable program code is configured to cause the at least one processing device to (Ebrahimi, Para. 0033, processing component 310 may store the output file in local memory or in database 120. Alternatively, or additionally, processing component 310 may send the output file to source device 110 or another destination):
 receive, from one or more data channels and data sources, data files comprising unmasked data (Ebrahimi, Para. 0018, a user of source device 110 may push, or upload, an input file to server 130, via a secure connection through network 140, for data masking); 
compile the masked text data and reconstruct the data files by substituting the unmasked data with masked data (Ebrahimi, Para. 0073, the particular unmasking function that is performed may be preconfigured and determined based on a table lookup that depends, for example, on the type of sensitive data element involved. As a result of the data unmasking operations, the masked sensitive data elements, within the dynamic tables, may be unmasked (e.g., returned to the original)); and 
store the data file as a secure masked data file (Ebrahimi, Para. 0020, server 130 may access (or receive data from) database 120 to perform a data masking operation on certain data in database 120 and to generate masked data that server 130 may store and/or send to database 120 or another destination).  
Ebrahimi fails to disclose extract text data from the unmasked data; 
parse the text data and analyze syntax of the text data via a machine learning engine; 
identify and categorize sensitive text data via the machine learning engine; wherein the sensitive text data is a subset of the text data;
replace the sensitive text data with generic mask data to generate masked text data; 
However, Aravamudan teaches extract text data from the unmasked data (Aravamudan, Para. 0239, text data may be extracted from the documents, note the document which can interpret as the unmasked data); 
parse the text data and analyze syntax of the text data via a machine learning engine (Aravamudan, Paras. 0188, and 0317, the rule-based template may be created by mapping each of the one or more portions of the text sequence to a corresponding syntax template, identifying a candidate syntax template based on a machine learning model that infers one or more candidate syntax templates based on the one or more portions of the text sequence); 
identify and categorize sensitive text data via the machine learning engine (Aravamudan, Para. 0316, one of the plurality of entity tagging models is trained to tag entities of an entity type, the entity type including at least one of a personal name, an organization name, an age, a date, a time, a phone number, a pager number, a clinical identification number, an email address, an IP address, a web URL, a vehicle number, a physical address, a zip code, a social security number, or a date of birth), wherein the sensitive text data is a subset of the text data (Aravamudan, Para. 0233, a portion of the corpus containing sensitive information); 
replace the sensitive text data with generic mask data to generate masked text data (Aravamudan, Para. 0180, a text input 1101 that is fed to the system 1102 will result in an output 1103 where subset of entities of interest, that are either single words or multi word phrases will be selectively masked (replaced with a generic placeholder token)); 
Ebrahimi and Aravamudan are both considered to be analogous to the claim invention because they are in the same field of using an unsupervised learning model to mask the unstructured data. Therefore, it would have been obvious to someone ordinary skill in the art before the effective filing date of the claimed invention to have modified Ebrahimi to incorporate the teachings of Aravamudan to include extract text data from the unmasked data (Aravamudan, Para. 0239); 
parse the text data and analyze syntax of the text data via a machine learning engine (Aravamudan, Paras. 0188); 
identify and categorize sensitive text data via the machine learning engine (Aravamudan, Para. 0316), wherein the sensitive text data is a subset of the text data (Aravamudan, Para. 0233); 
replace the sensitive text data with generic mask data to generate masked text data (Aravamudan, Para. 0180).  Doing so would help to enable biomedical (and other types of) data to be analyzed by computational processes under the constraint of maintaining the privacy of the individual patient or consumer. Such a system and methods will consequently be of great commercial, social and scientific benefit to society (Aravamudan, Para. 0005).

In regards to claim 2, the combination of Ebrahimi and Aravamudan teaches the system of claim 1, wherein extracting text data from the unmasked data further comprises converting the data files from an originating file type to plain text data (Aravamudan, Para. 0233, for example, non-text data (e.g., image data) and/or metadata may be removed from the documents, text data may be extracted from the documents (e.g., by optical character recognition), or the like. The format of the documents may be converted to a uniform format, or data from the documents may be used to populate a database (e.g., database 1570)).  Therefore, it would have been obvious to someone ordinary skill in the art before the effective filing date of the claimed invention to have modified Ebrahimi to incorporate the teachings of Aravamudan to include the system of claim 1, wherein extracting text data from the unmasked data further comprises converting the data files from an originating file type to plain text data (Aravamudan, Para. 0233).  Doing so would help to enable biomedical (and other types of) data to be analyzed by computational processes under the constraint of maintaining the privacy of the individual patient or consumer. Such a system and methods will consequently be of great commercial, social and scientific benefit to society (Aravamudan, Para. 0005).

In regards to claim 3, the combination of Ebrahimi and Aravamudan teaches the system of claim 1, wherein the machine learning engine further comprises an unsupervised machine learning model, wherein the unsupervised machine learning model detects sensitive text data based on contextual syntax of the text data (Aravamudan, 0186, applying statistical named entity recognition (NER) models to individual sentences in the corpus. In large corpuses with often-repeated patterns of text, statistical methods may not capture all instances of a pattern. For example, in the sentence “Electronically signed by: SMITH, JOHN C on 01/02/1980 at 12:12 PM CST”, ‘SMITH, JOHN C’ might be detected as a person entity but in the very similar sentence “Electronically signed by: DEWEY, JONES K on 01/02/1980 at 12:12 PM CST”, ‘DEWEY, JONES K’ may not be fully be detected as a person. In such situations, pattern based methods perform better. A regular expression syntax like “Electronically signed by: [A-Za-z]+, [A-Za-z]+[A-Za-z]+ on \d+Λd+Λd+ at \d+:\d+PM CST” would capture all such cases). Therefore, it would have been obvious to someone ordinary skill in the art before the effective filing date of the claimed invention to have modified Ebrahimi to incorporate the teachings of Aravamudan to include wherein the machine learning engine further comprises an unsupervised machine learning model, wherein the unsupervised machine learning model detects sensitive text data based on contextual syntax of the text data (Aravamudan, 0186).  Doing so would help to improve reliability and, in embodiments where at least a subset of the corpus is labeled to provide training (or testing) data for a machine learning model, may reduce the amount of data that is tagged manually. This may facilitate the rapid and accurate development and training of machine learning models based on the corpus, such as sentiment classifiers (Aravamudan, para. 0232).

In regards to claim 4, the combination of Ebrahimi and Aravamudan teaches the combination of Ebrahimi and Aravamudan teaches the system of claim 1, wherein the machine learning engine further comprises an unsupervised machine learning model operatively connected to a knowledge database of exemplary sensitive text data types (Aravamudan, Para. 0185, When using BERT, for each entity type a pre-trained model that is best suited for the entity type is chosen. For instance, when tagging entities like a person, location, etc., a model trained unsupervised on a generic corpus like Wikipedia, may suffice. In some embodiments, the pre-trained model may be based on other existing publicly available databases to augment model training such as health science journals, professional publications, peer-reviewed journal publications, or an operator-compiled database, among others). Therefore, it would have been obvious to someone ordinary skill in the art before the effective filing date of the claimed invention to have modified Ebrahimi to incorporate the teachings of Aravamudan to include wherein the machine learning engine further comprises an unsupervised machine learning model operatively connected to a knowledge database of exemplary sensitive text data types (Aravamudan, Para. 0185).  Doing so would help to improve reliability and, in embodiments where at least a subset of the corpus is labeled to provide training (or testing) data for a machine learning model, may reduce the amount of data that is tagged manually. This may facilitate the rapid and accurate development and training of machine learning models based on the corpus, such as sentiment classifiers (Aravamudan, para. 0232).

In regards to claim 6, the combination of Ebrahimi and Aravamudan teaches the system of claim 1, wherein reconstructing the data files further comprises converting plain text data containing masked data to an originating file type of the data files (Ebrahimi, Para. 0038, the unmasking engines may then simultaneously perform data unmasking operations on the masked sensitive data elements in the dynamic tables to unmask the sensitive data elements (e.g., return the sensitive data elements to their original form)).  

In regards to claim 7, the combination of Ebrahimi and Aravamudan teaches the system of claim 1, wherein the machine learning engine further comprises an unsupervised machine learning model trained to identify one or more rules for contextual analysis of the text data without human supervision (Aravamudan, Para. 0182, For example, a rule-based algorithm may be based solely on the sequence of information in a standard format such as dates presented in the format of “Day/Month/Year” (e.g., XX/XX/XX or XX/XX/XXXX) or telephone numbers presented in ten-digit format (e.g., (XXX) XXX-XXXX). Based on these standard formats, the rule-based algorithm can identify the pattern and replace potentially identifying information with a generic placeholder to mask the information). Therefore, it would have been obvious to someone ordinary skill in the art before the effective filing date of the claimed invention to have modified Ebrahimi to incorporate the teachings of Aravamudan to include wherein the machine learning engine further comprises an unsupervised machine learning model trained to identify one or more rules for contextual analysis of the text data without human supervision (Aravamudan, Para. 0182).  Doing so would help to improve reliability and, in embodiments where at least a subset of the corpus is labeled to provide training (or testing) data for a machine learning model, may reduce the amount of data that is tagged manually. This may facilitate the rapid and accurate development and training of machine learning models based on the corpus, such as sentiment classifiers (Aravamudan, para. 0232).

In regards to claim 8, Ebrahimi discloses a computer program product for data analysis and security via intelligent masking of unstructured data, the computer program product comprising a non-transitory computer-readable storage medium having computer-executable instructions to (Ebrahimi, Para. 0023):
 receive, from one or more data channels and data sources, data files comprising unmasked data (Ebrahimi, Para. 0018, a user of source device 110 may push, or upload, an input file to server 130, via a secure connection through network 140, for data masking); 
compile the masked text data and reconstruct the data files by substituting the unmasked data with masked data (Ebrahimi, Para. 0073, the particular unmasking function that is performed may be preconfigured and determined based on a table lookup that depends, for example, on the type of sensitive data element involved. As a result of the data unmasking operations, the masked sensitive data elements, within the dynamic tables, may be unmasked (e.g., returned to the original)); and 
store the data file as a secure masked data file (Ebrahimi, Para. 0020, server 130 may access (or receive data from) database 120 to perform a data masking operation on certain data in database 120 and to generate masked data that server 130 may store and/or send to database 120 or another destination). 
Ebrahimi fails to disclose extract text data from the unmasked data;
 parse the text data and analyze syntax of the text data via a machine learning engine; 
identify and categorize sensitive text data via the machine learning engine, wherein the sensitive text data is a subset of the text data;
 replace the sensitive text data with generic mask data to generate masked text data; 
 However, Aravamudan teaches extract text data from the unmasked data (Aravamudan, Para. 0239, text data may be extracted from the documents, note the document which can interpret as the unmasked data);
 parse the text data and analyze syntax of the text data via a machine learning engine (Aravamudan, Paras. 0188, and 0317, the rule-based template may be created by mapping each of the one or more portions of the text sequence to a corresponding syntax template, identifying a candidate syntax template based on a machine learning model that infers one or more candidate syntax templates based on the one or more portions of the text sequence); 
identify and categorize sensitive text data via the machine learning engine (Aravamudan, Para. 0316, one of the plurality of entity tagging models is trained to tag entities of an entity type, the entity type including at least one of a personal name, an organization name, an age, a date, a time, a phone number, a pager number, a clinical identification number, an email address, an IP address, a web URL, a vehicle number, a physical address, a zip code, a social security number, or a date of birth), wherein the sensitive text data is a subset of the text data (Aravamudan, Para. 0233, a portion of the corpus containing sensitive information);
 replace the sensitive text data with generic mask data to generate masked text data (Aravamudan, Para. 0180, a text input 1101 that is fed to the system 1102 will result in an output 1103 where subset of entities of interest, that are either single words or multi word phrases will be selectively masked (replaced with a generic placeholder token)); 
Ebrahimi and Aravamudan are both considered to be analogous to the claim invention because they are in the same field of using an unsupervised learning model to mask the unstructured data. Therefore, it would have been obvious to someone ordinary skill in the art before the effective filing date of the claimed invention to have modified Ebrahimi to incorporate the teachings of Aravamudan to include extract text data from the unmasked data (Aravamudan, Para. 0239);
 parse the text data and analyze syntax of the text data via a machine learning engine (Aravamudan, Paras. 0188, and 0317); 
identify and categorize sensitive text data via the machine learning engine (Aravamudan, Para. 0316), wherein the sensitive text data is a subset of the text data (Aravamudan, Para. 0233);
 replace the sensitive text data with generic mask data to generate masked text data (Aravamudan, Para. 0180).  Doing so would help to enable biomedical (and other types of) data to be analyzed by computational processes under the constraint of maintaining the privacy of the individual patient or consumer. Such a system and methods will consequently be of great commercial, social and scientific benefit to society (Aravamudan, Para. 0005).

In regards to claim 9, the combination of Ebrahimi and Aravamudan teaches the computer program product of claim 8, wherein extracting text data from the unmasked data further comprises converting the data files from an originating file type to plain text data (Aravamudan, Para. 0233, for example, non-text data (e.g., image data) and/or metadata may be removed from the documents, text data may be extracted from the documents (e.g., by optical character recognition), or the like. The format of the documents may be converted to a uniform format, or data from the documents may be used to populate a database (e.g., database 1570)).  Therefore, it would have been obvious to someone ordinary skill in the art before the effective filing date of the claimed invention to have modified Ebrahimi to incorporate the teachings of Aravamudan to include the system of claim 1, wherein extracting text data from the unmasked data further comprises converting the data files from an originating file type to plain text data (Aravamudan, Para. 0233).  Doing so would help to enable biomedical (and other types of) data to be analyzed by computational processes under the constraint of maintaining the privacy of the individual patient or consumer. Such a system and methods will consequently be of great commercial, social and scientific benefit to society (Aravamudan, Para. 0005).  

In regards to claim 10, the combination of Ebrahimi and Aravamudan teaches the computer program product of claim 8, wherein the machine learning engine further comprises an unsupervised machine learning model, wherein the unsupervised machine learning model detects sensitive text data based on contextual syntax of the text data (Aravamudan, 0186, applying statistical named entity recognition (NER) models to individual sentences in the corpus. In large corpuses with often-repeated patterns of text, statistical methods may not capture all instances of a pattern. For example, in the sentence “Electronically signed by: SMITH, JOHN C on 01/02/1980 at 12:12 PM CST”, ‘SMITH, JOHN C’ might be detected as a person entity but in the very similar sentence “Electronically signed by: DEWEY, JONES K on 01/02/1980 at 12:12 PM CST”, ‘DEWEY, JONES K’ may not be fully be detected as a person. In such situations, pattern based methods perform better. A regular expression syntax like “Electronically signed by: [A-Za-z]+, [A-Za-z]+[A-Za-z]+ on \d+Λd+Λd+ at \d+:\d+PM CST” would capture all such cases). Therefore, it would have been obvious to someone ordinary skill in the art before the effective filing date of the claimed invention to have modified Ebrahimi to incorporate the teachings of Aravamudan to include wherein the machine learning engine further comprises an unsupervised machine learning model, wherein the unsupervised machine learning model detects sensitive text data based on contextual syntax of the text data (Aravamudan, 0186).  Doing so would help to improve reliability and, in embodiments where at least a subset of the corpus is labeled to provide training (or testing) data for a machine learning model, may reduce the amount of data that is tagged manually. This may facilitate the rapid and accurate development and training of machine learning models based on the corpus, such as sentiment classifiers (Aravamudan, para. 0232).  

In regards to claim 11, the combination of Ebrahimi and Aravamudan teaches the computer program product of claim 8, wherein the machine learning engine further comprises an unsupervised machine learning model operatively connected to a knowledge database of exemplary sensitive text data types (Aravamudan, Para. 0185, When using BERT, for each entity type a pre-trained model that is best suited for the entity type is chosen. For instance, when tagging entities like a person, location, etc., a model trained unsupervised on a generic corpus like Wikipedia, may suffice. In some embodiments, the pre-trained model may be based on other existing publicly available databases to augment model training such as health science journals, professional publications, peer-reviewed journal publications, or an operator-compiled database, among others). Therefore, it would have been obvious to someone ordinary skill in the art before the effective filing date of the claimed invention to have modified Ebrahimi to incorporate the teachings of Aravamudan to include wherein the machine learning engine further comprises an unsupervised machine learning model operatively connected to a knowledge database of exemplary sensitive text data types (Aravamudan, Para. 0185).  Doing so would help to improve reliability and, in embodiments where at least a subset of the corpus is labeled to provide training (or testing) data for a machine learning model, may reduce the amount of data that is tagged manually. This may facilitate the rapid and accurate development and training of machine learning models based on the corpus, such as sentiment classifiers (Aravamudan, para. 0232).
  
In regards to claim 13, the combination of Ebrahimi and Aravamudan teaches the computer program product of claim 8, wherein reconstructing the data files further comprises converting plain text data containing masked data to an originating file type of the data files (Ebrahimi, Para. 0038, the unmasking engines may then simultaneously perform data unmasking operations on the masked sensitive data elements in the dynamic tables to unmask the sensitive data elements (e.g., return the sensitive data elements to their original form)).  

In regards to claim 14, the combination of Ebrahimi and Aravamudan teaches the computer program product of claim 8, wherein the machine learning engine further comprises an unsupervised machine learning model trained to identify one or more rules for contextual analysis of the text data without human supervision (Aravamudan, Para. 0182, For example, a rule-based algorithm may be based solely on the sequence of information in a standard format such as dates presented in the format of “Day/Month/Year” (e.g., XX/XX/XX or XX/XX/XXXX) or telephone numbers presented in ten-digit format (e.g., (XXX) XXX-XXXX). Based on these standard formats, the rule-based algorithm can identify the pattern and replace potentially identifying information with a generic placeholder to mask the information). Therefore, it would have been obvious to someone ordinary skill in the art before the effective filing date of the claimed invention to have modified Ebrahimi to incorporate the teachings of Aravamudan to include wherein the machine learning engine further comprises an unsupervised machine learning model trained to identify one or more rules for contextual analysis of the text data without human supervision (Aravamudan, Para. 0182).  Doing so would help to improve reliability and, in embodiments where at least a subset of the corpus is labeled to provide training (or testing) data for a machine learning model, may reduce the amount of data that is tagged manually. This may facilitate the rapid and accurate development and training of machine learning models based on the corpus, such as sentiment classifiers (Aravamudan, para. 0232).

In regards to claim 15, Ebrahimi discloses a computer implemented method for data analysis and security via intelligent masking of unstructured data, the computer implemented method comprising: 
providing a computing system comprising a computer processing device and a non- transitory computer readable medium (Ebrahimi, Para. 0025, Server 130 may perform these operations in response to processor 220 executing software instructions contained in a computer-readable medium, such as main memory 230.), where the non-transitory computer readable medium comprises configured computer program instruction code, such that when said instruction code is operated by said computer processing device, said computer processing device performs the following operations (Ebrahimi, Para. 0023):
Page 25 of 27AttyDktNo: 9718US1.014033.3772receive, from one or more data channels and data sources, data files comprising unmasked data (Ebrahimi, Para. 0018, a user of source device 110 may push, or upload, an input file to server 130, via a secure connection through network 140, for data masking); 
compile the masked text data and reconstruct the data files by substituting the unmasked data with masked data (Ebrahimi, Para. 0073, the particular unmasking function that is performed may be preconfigured and determined based on a table lookup that depends, for example, on the type of sensitive data element involved. As a result of the data unmasking operations, the masked sensitive data elements, within the dynamic tables, may be unmasked (e.g., returned to the original)); and 
store the data file as a secure masked data file (Ebrahimi, Para. 0020, server 130 may access (or receive data from) database 120 to perform a data masking operation on certain data in database 120 and to generate masked data that server 130 may store and/or send to database 120 or another destination).  
Ebrahimi fails to disclose extract text data from the unmasked data; 
parse the text data and analyze syntax of the text data via a machine learning engine;
 identify and categorize sensitive text data via the machine learning engine, wherein the sensitive text data is a subset of the text data; 
replace the sensitive text data with generic mask data to generate masked text data; 
However, Aravamudan teaches extract text data from the unmasked data (Aravamudan, Para. 0239, text data may be extracted from the documents, note the document which can interpret as the unmasked data); 
parse the text data and analyze syntax of the text data via a machine learning engine (Aravamudan, Paras. 0188, and 0317, the rule-based template may be created by mapping each of the one or more portions of the text sequence to a corresponding syntax template, identifying a candidate syntax template based on a machine learning model that infers one or more candidate syntax templates based on the one or more portions of the text sequence);
 identify and categorize sensitive text data via the machine learning engine (Aravamudan, Para. 0316, one of the plurality of entity tagging models is trained to tag entities of an entity type, the entity type including at least one of a personal name, an organization name, an age, a date, a time, a phone number, a pager number, a clinical identification number, an email address, an IP address, a web URL, a vehicle number, a physical address, a zip code, a social security number, or a date of birth), wherein the sensitive text data is a subset of the text data(Aravamudan, Para. 0233, a portion of the corpus containing sensitive information); 
replace the sensitive text data with generic mask data to generate masked text data (Aravamudan, Para. 0180, a text input 1101 that is fed to the system 1102 will result in an output 1103 where subset of entities of interest, that are either single words or multi word phrases will be selectively masked (replaced with a generic placeholder token)); Ebrahimi and Aravamudan are both considered to be analogous to the claim invention because they are in the same field of using an unsupervised learning model to mask the unstructured data. Therefore, it would have been obvious to someone ordinary skill in the art before the effective filing date of the claimed invention to have modified Ebrahimi to incorporate the teachings of Aravamudan to include extract text data from the unmasked data (Aravamudan, Para. 0239); 
parse the text data and analyze syntax of the text data via a machine learning engine (Aravamudan, Paras. 0188, and 0317);
 identify and categorize sensitive text data via the machine learning engine (Aravamudan, Para. 0316), wherein the sensitive text data is a subset of the text data (Aravamudan, Para. 0233); 
replace the sensitive text data with generic mask data to generate masked text data (Aravamudan, Para. 0180).  Doing so would help to enable biomedical (and other types of) data to be analyzed by computational processes under the constraint of maintaining the privacy of the individual patient or consumer. Such a system and methods will consequently be of great commercial, social and scientific benefit to society (Aravamudan, Para. 0005).

In regards to claim 16, the combination of Ebrahimi and Aravamudan teaches the computer implemented method of claim 15, wherein extracting text data from the unmasked data further comprises converting the data files from an originating file type to plain text data (Aravamudan, Para. 0233, for example, non-text data (e.g., image data) and/or metadata may be removed from the documents, text data may be extracted from the documents (e.g., by optical character recognition), or the like. The format of the documents may be converted to a uniform format, or data from the documents may be used to populate a database (e.g., database 1570)).  Therefore, it would have been obvious to someone ordinary skill in the art before the effective filing date of the claimed invention to have modified Ebrahimi to incorporate the teachings of Aravamudan to include the system of claim 1, wherein extracting text data from the unmasked data further comprises converting the data files from an originating file type to plain text data (Aravamudan, Para. 0233).  Doing so would help to enable biomedical (and other types of) data to be analyzed by computational processes under the constraint of maintaining the privacy of the individual patient or consumer. Such a system and methods will consequently be of great commercial, social and scientific benefit to society (Aravamudan, Para. 0005). 

In regards to claim 17, the combination of Ebrahimi and Aravamudan teaches the computer implemented method of claim 15, wherein the machine learning engine further comprises an unsupervised machine learning model, wherein the unsupervised machine learning model detects sensitive text data based on contextual syntax of the text data (Aravamudan, 0186, applying statistical named entity recognition (NER) models to individual sentences in the corpus. In large corpuses with often-repeated patterns of text, statistical methods may not capture all instances of a pattern. For example, in the sentence “Electronically signed by: SMITH, JOHN C on 01/02/1980 at 12:12 PM CST”, ‘SMITH, JOHN C’ might be detected as a person entity but in the very similar sentence “Electronically signed by: DEWEY, JONES K on 01/02/1980 at 12:12 PM CST”, ‘DEWEY, JONES K’ may not be fully be detected as a person. In such situations, pattern based methods perform better. A regular expression syntax like “Electronically signed by: [A-Za-z]+, [A-Za-z]+[A-Za-z]+ on \d+Λd+Λd+ at \d+:\d+PM CST” would capture all such cases). Therefore, it would have been obvious to someone ordinary skill in the art before the effective filing date of the claimed invention to have modified Ebrahimi to incorporate the teachings of Aravamudan to include wherein the machine learning engine further comprises an unsupervised machine learning model, wherein the unsupervised machine learning model detects sensitive text data based on contextual syntax of the text data (Aravamudan, 0186).  Doing so would help to improve reliability and, in embodiments where at least a subset of the corpus is labeled to provide training (or testing) data for a machine learning model, may reduce the amount of data that is tagged manually. This may facilitate the rapid and accurate development and training of machine learning models based on the corpus, such as sentiment classifiers (Aravamudan, para. 0232).  

In regards to claim 18, the combination of Ebrahimi and Aravamudan teaches the computer implemented method of claim 15, wherein the machine learning engine further comprises an unsupervised machine learning model operatively connected to a knowledge database of exemplary sensitive text data types (Aravamudan, Para. 0185, When using BERT, for each entity type a pre-trained model that is best suited for the entity type is chosen. For instance, when tagging entities like a person, location, etc., a model trained unsupervised on a generic corpus like Wikipedia, may suffice. In some embodiments, the pre-trained model may be based on other existing publicly available databases to augment model training such as health science journals, professional publications, peer-reviewed journal publications, or an operator-compiled database, among others). Therefore, it would have been obvious to someone ordinary skill in the art before the effective filing date of the claimed invention to have modified Ebrahimi to incorporate the teachings of Aravamudan to include wherein the machine learning engine further comprises an unsupervised machine learning model operatively connected to a knowledge database of exemplary sensitive text data types (Aravamudan, Para. 0185).  Doing so would help to improve reliability and, in embodiments where at least a subset of the corpus is labeled to provide training (or testing) data for a machine learning model, may reduce the amount of data that is tagged manually. This may facilitate the rapid and accurate development and training of machine learning models based on the corpus, such as sentiment classifiers (Aravamudan, para. 0232).  

In regards to claim 20, the combination of Ebrahimi and Aravamudan teaches the computer implemented method of claim 15, wherein reconstructing the data files further comprises converting plain text data containing masked data to an originating file type of the data files (Ebrahimi, Para. 0038, the unmasking engines may then simultaneously perform data unmasking operations on the masked sensitive data elements in the dynamic tables to unmask the sensitive data elements (e.g., return the sensitive data elements to their original form)).

Claims 5, 12, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over EBRAHMI et al. (US 2010/0205189 A1), hereinafter Ebrahimi in view of ARAVAMUDAN et al. (US 2020/0402625 A1), hereinafter Aravamudan, and further in view of Salgado et al. (US 2008/0239365A1), hereinafter Salgado.

In regards to claim 5, Ebrahimi in view of Aravamudan fails to teach the system of claim 1, wherein the generic mask data further comprises a black rectangular shape character in place of an alphanumeric character.  
However, Salgado teaches wherein the generic mask data further comprises a black rectangular shape character in place of an alphanumeric character (Salgado, Paras. 0021- 0022, the mask may appear as a rectangular opaque strip 50 of color which contrasts with the background color of the document, e.g., a red or black strip).  
Ebrahimi, Aravamudan, and Salgado are all considered to be analogous to the claim invention because they are in the same field of using an unsupervised learning model to mask the unstructured data. Therefore, it would have been obvious to someone ordinary skill in the art before the effective filing date of the claimed invention to have modified Ebrahimi and Aravamudan to incorporate the teachings of Salgado to include wherein the generic mask data further comprises a black rectangular shape character in place of an alphanumeric character (Salgado, Paras. 0021- 0022).  Doing so would help to provide the capability of leaving access to sensitive but otherwise interesting documents (whether printed or scanned) to readers by removing the sensitive information on the document itself. In some instances, it can also improve in the readability of certain documents by masking irrelevant information, in order not to distract the user (Salgado, Para. 0061).

In regards to claim 12, Ebrahimi in view of Aravamudan fails to teach the computer program product of claim 8, wherein the generic mask data further comprises a black rectangular shape character in place of an alphanumeric character. 
However, Salgado teaches wherein the generic mask data further comprises a black rectangular shape character in place of an alphanumeric character (Salgado, Paras. 0021- 0022, the mask may appear as a rectangular opaque strip 50 of color which contrasts with the background color of the document, e.g., a red or black strip).  
Ebrahimi, Aravamudan, and Salgado are all considered to be analogous to the claim invention because they are in the same field of using an unsupervised learning model to mask the unstructured data. Therefore, it would have been obvious to someone ordinary skill in the art before the effective filing date of the claimed invention to have modified Ebrahimi and Aravamudan to incorporate the teachings of Salgado to include wherein the generic mask data further comprises a black rectangular shape character in place of an alphanumeric character (Salgado, Paras. 0021- 0022).  Doing so would help to provide the capability of leaving access to sensitive but otherwise interesting documents (whether printed or scanned) to readers by removing the sensitive information on the document itself. In some instances, it can also improve in the readability of certain documents by masking irrelevant information, in order not to distract the user (Salgado, Para. 0061).

In regards to claim 19, Ebrahimi in view of Aravamudan fails to teach the computer implemented method of claim 15, wherein the generic mask data further comprises a black rectangular shape character in place of an alphanumeric character.
However, Salgado teaches wherein the generic mask data further comprises a black rectangular shape character in place of an alphanumeric character (Salgado, Paras. 0021- 0022, the mask may appear as a rectangular opaque strip 50 of color which contrasts with the background color of the document, e.g., a red or black strip).
Ebrahimi, Aravamudan, and Salgado are all considered to be analogous to the claim invention because they are in the same field of using an unsupervised learning model to mask the unstructured data. Therefore, it would have been obvious to someone ordinary skill in the art before the effective filing date of the claimed invention to have modified Ebrahimi and Aravamudan to incorporate the teachings of Salgado to include wherein the generic mask data further comprises a black rectangular shape character in place of an alphanumeric character (Salgado, Paras. 0021- 0022).  Doing so would help to provide the capability of leaving access to sensitive but otherwise interesting documents (whether printed or scanned) to readers by removing the sensitive information on the document itself. In some instances, it can also improve in the readability of certain documents by masking irrelevant information, in order not to distract the user (Salgado, Para. 0061).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Boukobza (US 9.418,237 B2) teaches an efficient and effective system and method that is application agnostic and provides transparent privacy controls to existing RDBMS implementations.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GITA FARAMARZI whose telephone number is (571) 272-0248. The examiner can normally be reached 9:30 AM- 6:30 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jorge L. Ortiz-Criado can be reached on (571) 272-7624. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from
Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/G.F./
Examiner, Art Unit 2496

/JORGE L ORTIZ CRIADO/               Supervisory Patent Examiner, Art Unit 2496