DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Amendment filed on 04/05/2022 has been entered.  Claims 1-20 are pending.  Claims 1 and 11 have been amended.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claims 1 and 11 contain subject matter “creating machine learning models based on the received data that include details information data about the one or more attributes and only table and attribute physical names associated with the one or more attributes” and “identifying corresponding metadata associated with the one or more attributes by implementing the machine learning models” (emphasis added), which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.  Applicant points out paragraphs [0014]-[0016] of the originally filed specification for supporting these amended limitations.  Paragraph [0014] describes some examples of what kinds of data are included in attributes and metadata about attributes, and “various models where (i) most of the details about the attribute are known and (ii) only table and attribute physical names are known”.  However, Examiner finds nowhere in the Specification for supporting “creating machine learning models based on the received data that include details information data about the one or more attributes and only table and attribute physical names associated with the one or more attributes” and “identifying corresponding metadata associated with the one or more attributes by implementing the machine learning models”.  Please verify.  Claims 2-10 depend on claim 1 and claims 12-20 depend on claim 11.  Therefore, claims 2-10 and 12-20 are rejected under the same rationale.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Rule et al. (Rule), US Patent Application Publication No. US 2021/0092224A1, in view of Antonatos et al. (Antonatos), US Patent Application Publication No. US 2020/0320406 A1, and further in view of Enuka et al. (Enuka), US Patent Application Publication No. US 2020/0050966 A1.

As to independent claim 1, Rule discloses a system that performs data classification for personally identifiable information (PII) data, the system comprising: 
a database system that stores attribute data and corresponding metadata (paragraph [0006]: a plurality of records (attribute data) stored at a database, wherein each record is associated with a phone call (metadata) and includes at least one request generated based on a transcript of the phone call using a natural language processing); 
an interactive user interface configured to receive user input via a communication network (paragraph [0011]: a service provider can receive a plurality of calls (user input) from a plurality of callers); and 
a computer processor, coupled to the database system and the communication network (Figure 2 and paragraph [0053]: processor 206 is connected with a secondary storage device (database) and a network connection device), configured to perform the steps of: 
receiving data relating to one or more attributes (paragraph [0011]: record and analyze the call, wherein the record can be a file, folder, media file, document, etc, and include information such as a time for an incoming call, a phone number, an account, (metadata); 
identifying corresponding metadata associated with the one or more attributes by implementing the machine learning models (paragraph [0011]: record and analyze the call, wherein the record can be a file, folder, media file, document, etc, and include information such as a time for an incoming call, a phone number, an account, (metadata); paragraph [0015]: using an intent recognition module to analyze the transcript for a call, wherein the intent recognition module can use intent classification techniques, which can be a natural language understanding(NLU); the intent recognition module can also determine the intent by training a supervised machine learning classification model on labeled data); and 
classifying the one or more attributes (paragraphs [0011]-[0012]: generating a transcript for each call, dividing the transcript into small segments and matching these segments to known phonemes through a complex statistical model to determine what the caller was saying and outputting it as text); 
wherein the classifying is based on statistical techniques, machine learning models, and natural language processing (paragraph [0012]: classifying a transcript using a complex statistical model; paragraph [0015]: using an intent recognition module to analyze the transcript for a call, wherein the intent recognition module can use intent classification techniques, which can be a natural language understanding(NLU); the intent recognition module can also determine the intent by training a supervised machine learning classification model on labeled data).
Rule, however, does not explicitly disclose classifying the one or more attributes into non-Pll data and PII data based on the identified corresponding metadata and further classifying the PII data into one of a plurality of protection groups, each protection group identifying access permissions. 
In the same field of endeavor, Antonatos discloses applying one or more data security rules, policies, and/or requirements on data read to prevent unauthorized user to access selected data/raw data (e.g., classified/private data) by transforming sensitive data according to according to one or more data security rules, policies and/or requirements (paragraph [0017]).  Antonatos further discloses classified /private data is detected using a machine learning operation such as natural language processing and/or artificial intelligence operation to learn data that may be determined to be classified as private, personal, sensitive, and/or proprietary, and the selected portion of data that is determined to be classified/private data may be filtered and/or anonymized (paragraphs [0017], [0063], [0064] and [0077]).  
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention to modify the system of Rule to include classifying the one or more attributes into non-Pll data and PII data based on the identified corresponding metadata and further classifying the PII data into one of a plurality of protection groups, each protection group identifying access permissions, as taught by Antonatos for the purpose of preventing unauthorized user to access the classified/private data.
Rule and Antonatos, however, do not disclose creating machine learning models based on the received data that include details information data about the one or more attributes and only table and attribute physical names associated with the one or more attributes.
In the same field of endeavor, Enuka discloses a system for providing privacy management platforms, which may correlate personal information findings to specific data subjects and may employ machine learning models to classify findings as corresponding to a particular personal information attribute to provide an indexed inventory across multiple data sources (Abstract).  Enuka further discloses the system may connect to one or more identity data sources and conduct a search for personal information contained therein, based on the stored personal information rules, and as potential personal information is found in an identity data source, the system may create a personal information findings (models) list of such information, including the value of each finding and/or metadata associated therewith, such as an associated attribute, the data source which the personal information was found, the location where the personal information is located within the data source (e.g., collection, table, field, row, etc) (paragraph [0040]).  Enuka further discloses in Figure 4 and paragraph [0018] a method that employs a machine learning model to classify fields in a scanned data source (receiving data) according to personal information attributes.  Enuka further discloses all of the personal information findings associated with a scan of a data source may be stored in a personal information findings file or collection, wherein each of the findings may comprise metadata associated with the found potential personal information, including one or more of: an attribute type, a value, a scan ID, data source information corresponding to the data source where the personal information is stored (e.g., name, type, location, access credentials, etc.) and/or location information corresponding to a location within the data source where the personal information is stored (e.g., collection, table field, row, etc.) (paragraphs [0067] and [0091], [0098]). 
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the systems of Rule and Antonatos to include creating machine learning models based on the received data that include details information data about the one or more attributes and only table and attribute physical names associated with the one or more attributes, as taught by Enuka, for the purpose of identifying and classifying sensitive data stored throughout an organization’s various data systems to facilitate the management of data risk and customer privacy.

As to dependent claim 2, Rule discloses wherein the computer processor is configured to perform the step of applying tokenization to the one or more attributes to identify a sequence of words (paragraph [0012]).

As to dependent claim 3, Rule discloses wherein the computer processor is configured to perform the step off applying a stemming process to the one or more attributes to reduce inflected words to a stem form (paragraph [0015]).

As to dependent claim 4, Rule discloses wherein the computer processor is configured to perform the step of applying a grouping of words as a single item (paragraph [0012]).

As to dependent claim 5, Rule discloses wherein the grouping comprises a lemmatization process (paragraph [0015]).

As to dependent claim 6, Rule discloses wherein the computer processor is configured to perform the step of: applying a weighting technique that represents an importance associated with a word (paragraphs [0013]-[0015]).

As to dependent claim 7, Rule discloses wherein the weighting technique comprises term frequency-inverse document frequency (TF-IDF) weight (paragraph [0015]).

As to dependent claim 8, Rule discloses wherein the term frequency-inverse document frequency (TR-IDF) weight comprises a first term that measures how frequently a term appears in a document (paragraph [0013]). 

As to dependent claim 9, Rule discloses wherein the term frequency-inverse document frequency (TF-IDF) weight comprises a second term that measures how important a term is (paragraph [0013]). 

As to dependent claim 10, Rule discloses wherein the computer processor is configured to perform the step of: applying a Synthetic Minority Oversampling Technique to crease a dataset in a balanced manner (paragraph [0033]). 

As to independent claim 11, Rule discloses a method that performs data classification for personally identifiable information (PII) data, the method comprising the steps of: 
receiving data relating to one or more attributes (paragraph [0011]: record and analyze the call, wherein the record can be a file, folder, media file, document, etc, and include information such as a time for an incoming call, a phone number, an account, (metadata): 
identifying corresponding metadata associated with the one or more attributes by implementing the machine learning models (paragraph [0011]: record and analyze the call, wherein the record can be a file, folder, media file, document, etc, and include information such as a time for an incoming call, a phone number, an account, (metadata); paragraph [0015]: using an intent recognition module to analyze the transcript for a call, wherein the intent recognition module can use intent classification techniques, which can be a natural language understanding(NLU); the intent recognition module can also determine the intent by training a supervised machine learning classification model on labeled data); and 
classifying the one or more attributes (paragraphs [0011]-[0012]: generating a transcript for each call, dividing the transcript into small segments and matching these segments to known phonemes through a complex statistical model to determine what the caller was saying and outputting it as text); 
wherein the classifying is based on statistical techniques, machine learning models, and natural language processing (paragraph [0012]: classifying a transcript using a complex statistical model; paragraph [0015]: using an intent recognition module to analyze the transcript for a call, wherein the intent recognition module can use intent classification techniques, which can be a natural language understanding(NLU); the intent recognition module can also determine the intent by training a supervised machine learning classification model on labeled data).
Rule, however, does not explicitly disclose classifying the one or more attributes into non-Pll data and PII data based on the identified corresponding metadata and further classifying the PII data into one of a plurality of protection groups, each protection group identifying access permissions. 
In the same field of endeavor, Antonatos discloses applying one or more data security rules, policies, and/or requirements on data read to prevent unauthorized user to access selected data/raw data (e.g., classified/private data) by transforming sensitive data according to according to one or more data security rules, policies and/or requirements (paragraph [0017]).  Antonatos further discloses classified /private data is detected using a machine learning operation such as natural language processing and/or artificial intelligence operation to learn data that may be determined to be classified as private, personal, sensitive, and/or proprietary, and the selected portion of data that is determined to be classified/private data may be filtered and/or anonymized (paragraphs [0017], [0063], [0064] and [0077]).  
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention to modify the system of Rule to include classifying the one or more attributes into non-Pll data and PII data based on the identified corresponding metadata and further classifying the PII data into one of a plurality of protection groups, each protection group identifying access permissions, as taught by Antonatos for the purpose of preventing unauthorized user to access the classified/private data.
Rule and Antonatos, however, do not disclose creating machine learning models based on the received data that include details information data about the one or more attributes and only table and attribute physical names associated with the one or more attributes.
In the same field of endeavor, Enuka discloses a system for providing privacy management platforms, which may correlate personal information findings to specific data subjects and may employ machine learning models to classify findings as corresponding to a particular personal information attribute to provide an indexed inventory across multiple data sources (Abstract).  Enuka further discloses the system may connect to one or more identity data sources and conduct a search for personal information contained therein, based on the stored personal information rules, and as potential personal information is found in an identity data source, the system may create a personal information findings (models) list of such information, including the value of each finding and/or metadata associated therewith, such as an associated attribute, the data source which the personal information was found, the location where the personal information is located within the data source (e.g., collection, table, field, row, etc) (paragraph [0040]).  Enuka further discloses in Figure 4 and paragraph [0018] a method that employs a machine learning model to classify fields in a scanned data source (receiving data) according to personal information attributes.  Enuka further discloses all of the personal information findings associated with a scan of a data source may be stored in a personal information findings file or collection, wherein each of the findings may comprise metadata associated with the found potential personal information, including one or more of: an attribute type, a value, a scan ID, data source information corresponding to the data source where the personal information is stored (e.g., name, type, location, access credentials, etc.) and/or location information corresponding to a location within the data source where the personal information is stored (e.g., collection, table field, row, etc.) (paragraphs [0067] and [0091], [0098]). 
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the systems of Rule and Antonatos to include creating machine learning models based on the received data that include details information data about the one or more attributes and only table and attribute physical names associated with the one or more attributes, as taught by Enuka, for the purpose of identifying and classifying sensitive data stored throughout an organization’s various data systems to facilitate the management of data risk and customer privacy.

As to dependent claim 12, Rule discloses applying tokenization to the one or more attributes to identify a sequence of words (paragraph [0012]).

As to dependent claim 13, Rule discloses applying a stemming process to the one or more attributes to reduce inflected words to a stem form (paragraph [0015]).

As to dependent claim 14, Rule discloses applying a grouping of words as a single item (paragraph [0012]).

As to dependent claim 15, Rule discloses wherein the grouping comprises a lemmatization process (paragraph [0015]).

As to dependent claim 16, Rule discloses applying a weighting technique that represents an importance associated with a word (paragraphs [0013]-[0015]).

As to dependent claim 17, Rule discloses wherein the weighting technique comprises term frequency-inverse document frequency (TF-IDF) weight (paragraph [0015]).

As to dependent claim 18, Rule discloses wherein the term frequency-inverse document frequency (TF-IDF) weight comprises a first term that measures how frequently a term appears in a document (paragraph [0013]).

As to dependent claim 19, Rule discloses wherein the term frequency-inverse document frequency (TP-IDF) weight comprises a second term that measures how important a term is (paragraph [0013]). 

As to dependent claim 20, Rule discloses applying a Synthetic Minority Oversampling Technique to increase a dataset in a balanced manner (paragraph [0033]).

Response to Arguments
	Applicant’s arguments and amendments filed on 04/05/2022 have been fully considered but they are not deemed fully persuasive.  Applicant’s arguments with respect to claims 1-20 have been considered but are moot in view of the new ground(s) of rejection as explained here below, necessitated by Applicant’s substantial amendment (i.e., creating machine learning models based on the received data that include details information data about the one or more attributes and only table and attribute physical names associated with the one or more attributes) to the claims which significantly affected the scope thereof.  Please see the rejection with additional newly cited prior above.

Conclusion
THIS ACTION IS MADE FINAL.   Applicant is reminded of the extension of time policy as set forth in 37CFR 1.136(a).
	
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHAU T NGUYEN whose telephone number is (571) 272-4092.  The examiner can normally be reached on 8:30 am – 5:30 pm Mon-Fri.

If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Cesar Paula, can be reached on (571) 272-4128.  The fax phone number for the organization where this application or proceeding is assigned is 703-872-9306.  On July 15, 2005, the Central Facsimile (FAX) Number will change from 703-872-9306 to 571-273-8300.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/CHAU T NGUYEN/