Detailed Action

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claims
Claims 1-35 have been amended. Claims 1, 4, 14, 22, 23, 29, and 35 are pending and rejected in the application. This action is Final.

Allowable Subject Matter
Claims 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15-21, 24-28, and 30-34  are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and all intervening claims. 

Claim Objection 
Claim 25 is objected to because of the following informalities:  
The claim 25 limitation states “identified identified…etc.” Please correct the limitation.  


Response to Arguments

Applicant Argues 
Thus, because the Office Action failed to establish all of the limitations recited in claim 1 are found in Vachhani and/or Weber, the Office Action has failed to establish claim 1 is obvious in view of Vachhani and Weber. For at least this reason, withdrawal of the rejection of claim 1, and its dependent claims, as being obvious over Vachhani in view of Weber is respectfully requested. Claims 22 and 35 recite similar limitations. As such, withdrawal of the rejections of claims 22 and 35, and their dependent claims, is also requested.

Examiner Responds:
Applicant's 35 USC § 103 arguments with respect to claims 1, 4, 14,  22, 23, 29, and 35 have been considered but are moot in view of the new ground(s) of rejection. 


Claim Rejections – 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 4, 14,  23, 25, 29, and 35 are rejected under 35 U.S.C. 103 as being unpatentable over Vachhani et al.  U.S. Patent Publication (2018/0053115; hereinafter: Vachhani) in view of Gunn et al. U.S. Patent (10,599,635; hereinafter: Gunn) and further in view of Weber et al. Non-Patent Publication (“Using Supervised Learning to Classify Metadata of Research Data by Discipline of Research”, October 2019; hereinafter: Weber) 

Claims 1, 22, and 35
As to claims 1, 22, and 35, Vachhani discloses a data processing and classification system comprising: 
at least one server configured to receive a data from an entity (paragraph[0009], “The system includes a server configured to receive spend data from an entity…etc.”); 
at least one data store having a plurality of databases including an operational database for storing the data received from the entity after cleansing and enrichment of the data received from the entity (paragraph[0049], “In step 238 an operation of fetching cleansed spend data is performed. In step 239 an operation of performing feature extraction to obtain distinct words from the spend data as variables is performed. The supplier information obtained from the web mining tool is provided to perform the feature extraction. In step 240 an operation of transforming the spend data into a classification data matrix of variables with enriched spend data is performed. The classification data matrix of zeros is created for the enriched spend data…etc.”), and at least one training model database for storing a historical data classification model (paragraph[0042], “If the entity is already registered, then an operation 204 of initiating refresh classification process is executed. The method further includes the operation 205 of fetching historical data of the identified registered entity from a historical database. The method further includes the operation 206 of cleansing the historical data and creating a supplier search index from the historical data 206a. The historical spend data is cleansed for obtaining normalized historical spend data. In this method an additional step 206b of saving the supplier search index may also be executed. The method further includes a step 207 of extracting taxonomy of classification based on a plurality of unique categories from the historical data…etc.”); 
a verification engine for checking if the data received from the entity is a new entity data (paragraph[0031], “The support mechanism 115 includes a verification engine 121 for verifying/identifying if the entity is a registered entity….etc.”); and 
a processor configured to select a data classification tool for generating classified data with a confidence score in response to determination of the cleansed enriched data as a new entity data (paragraph[0043], “In step 217 an operation of applying category code of the preferred model to obtain machine learning classification with confidence score in step 218. If the supplier information in the spend data is new, then an operation of applying web mining tool for obtaining unknown terms in the spend data to obtain an enriched spend data is performed. In step 219 fetching unknown terms/supplier information, i.e Name, description etc., from the received cleansed spend data is performed. In step 220 the terms are applied on a search engine API. After this step, an operation 221 of extracting a pre-determined number of URL's from the search engine results is performed. In step 222 an operation of crawling a page or pages of the extracted URL's is performed as the operation 223 of extracting pre-determined number of common words. In operation 224 the saved supplier search index from the historical database is loaded and in operation 225 a search is performed for exact match for supplier information obtained from the web mining tool and the saved supplier search index. In operation 226 a partial search is performed if exact match for supplier information is not successful. In operation 227 a classification result for the supplier is obtained by the matching engine. In operation 228 a final classification of the spend data is obtained with confidence score…etc.”), wherein the data classification tool is configured to:
generate a reference data from the at least one data subset by annotation through an Artificial Intelligence engine, (paragraph[0009], “he server further includes a verification engine for checking if the entity is a registered entity and a processor configured to select a classification tool amongst a refresh classification tool or implementation classification tool for generating a classified data with a confidence score. The refresh classification tool is selected if the entity is a registered entity else the implementation classification tool is selected. The server includes an AI (artificial intelligence) engine for creating a training classification model from the selected classification tool and a training model database for storing the training classification model…etc.”); and 
train an entity specific data model by applying transfer learning to a historical data model using the reference data (paragraph[0009] and paragraph[0025]-paragraph[0028], “If implementation classification of step 104 is initiated, then a step 108 of accessing pre-classified historical data of other registered entities from a cube of cubes database to build training models for classification of the received spend data from the unregistered entity is executed. In step 109 a check to determine whether a supplier information in the received spend data is available in the pre-classified historical data of the other registered entities is executed. If the supplier information is not available, then the step 107 of initiating the web mining tool to determine the information for classification is executed. In step 110 an operation of classifying received spend data using training models…etc.”)
wherein the Artificial Intelligence engine is configured to create a data matrix from the cleansed enriched data, and the entity specific data model is applied to the data matrix for classifying the cleansed enriched data and provide the confidence score of the classified data (paragraph[0042], “This method further includes the step 208 of performing feature extraction to obtain distinct words from the spend data as variables. In step 209 spend data is transformed into a training data matrix of variables with historical data. The training classification model is created from the classification code vectors and the training data matrix by using the machine learning engine (MLE) and the AI engine. To obtain a training classification model from the historical data of the identified registered entity, reading the data matrix X and the classification code vector ‘y’ and the input matrix as [X y]. In step 209a a Naïve Bayes (NB) training model is obtained by applying a Naïve Bayes algorithm. In step 209b a Support Vector machine (SVM) training model is obtained by applying support vector algorithm. In step 209c a Logistic regression training model is obtained by applying Logistic vector algorithm (LR). In step 209d the NB model is saved, in step 209e the SV model is saved, and in step 209f the LR model is saved. These models are saved on computer program product to be used in classification of spend data…etc.”).

Vachhani does not appear to explicitly disclose wherein an annotation script is configured to process the at least one data subset and generate the reference data based on a dynamic processing logic.
initiate a stratified sampling of the cleansed enriched data for obtaining at least one data subset from the cleansed enriched data; 

However, Gunn discloses wherein an annotation script is configured to process the at least one data subset and generate the reference data based on a dynamic processing logic (column 5, lines 40-52, “Process 300 can include obtaining (310) data load scripts. In a variety of embodiments, a data load script includes one or more instructions that cause a database system to generate one or more staging tables including a subset of the raw data stored using the database system. The data load script can include a set of attributes, a set of values, a set of transformations to be applied to the retrieved data, and/or a set of filters to be applied to the attributes, values, and/or the transformed data. The data can be obtained from one or more tables within the database. For example, a data load script can obtain all records from a database filtered by an activity date between the first and last date of the month and an account balance below a threshold value…etc.” and column 6, lines 1-12). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to a person having ordinary skill in the art to which said subject matter pertains to have modified the teachings of Vachhani with the teachings of Gunn to have data processing scripts which would result in the claim invention. The skilled artisan would have been motivated to improve the teachings of Vachhani with the teachings of Gunn to improve the quality efficiency, and speed of data processing systems, offering improved performance and reduced computational overhead, by generating staging data independently from the execution of control scripts which process the staging data (Gunn: column 1, lines 32-50).
	
The combination of Vachhani and Gunn do not appear to explicitly disclose initiate a stratified sampling of the cleansed enriched data for obtaining at least one data subset from the cleansed enriched data; 

However, Weber discloses initiate a stratified sampling of the cleansed enriched data for obtaining at least one data subset from the cleansed enriched data (page 5, “Stratified sampling is a necessary step in typical multi-label classification pipelines. In [25] an algorithm to realize a ”relaxed interpretation of stratified sampling for multi-label data” is proposed. The base idea is to distribute the data items over n subsets, starting with all data items labeled with the least common label…etc.”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to a person having ordinary skill in the art to which said subject matter pertains to have modified the teachings of Vachhani with the teachings of Gunn and Weber to stratify data which would result in the claim invention. The skilled artisan would have been motivated to improve the teachings of Vachhani with the teachings of Gunn and Weber to have a quantitative analysis of trends towards interdisciplinarity of digital scholarly output of the characterization of growth patterns of research data (Weber: Abstract).

Claim 2
As to claim 2, the combination of Vachhani, Gunn, and Weber discloses all the elements in claim 1, as noted above, and Vachhani further disclose wherein the historical data model is trained by pooling a plurality of distinct entity data as training data and the historical data model is used as an initialization for training the entity specific data model on receiving the cleansed enriched data of the new entity (paragraph[0042], “The method further includes the operation 205 of fetching historical data of the identified registered entity from a historical database. The method further includes the operation 206 of cleansing the historical data and creating a supplier search index from the historical data 206a. The historical spend data is cleansed for obtaining normalized historical spend data. In this method an additional step 206b of saving the supplier search index may also be executed. The method further includes a step 207 of extracting taxonomy of classification based on a plurality of unique categories from the historical data. Also, in this non-limiting example embodiment, classification code vectors are obtained from training data corresponding to each of the categories of classification. This method further includes the step 208 of performing feature extraction to obtain distinct words from the spend data as variables. In step 209 spend data is transformed into a training data matrix of variables with historical data…etc.”).

Claim 4
As to claim 4, the combination of Vachhani, Gunn, and Weber discloses all the elements in claim 1, as noted above, and Vachhani further disclose wherein the dynamic processing logic integrates deep learning, predictive analysis, information extraction, optimization and bots for processing the at least one data subset (paragraph[0064], “Based on maximum score value, a matching spend data is selected from the reference data and the category code of the reference data is returned as classification for the received spend data wherein the score value is the confidence value for the classification. Vendor name has the most significance in spend value, so the first search logic ‘Full Search’ is invoked which filters those records where the vendor name was present in the reference data. The search is then executed only in the selected subset of reference data. If the vendor name of new spend data is not present in reference data, then the second search logic ‘Partial Search’ is invoked which searches the current record in the entire repository of reference data. The results after executing ‘Full Search’ and ‘Partial Search’ are returned along with the confidence score…etc.”).

Claim 14
As to claim 14, the combination of Vachhani, Gunn, and Weber discloses all the elements in claim 9, as noted above, and Vachhani further disclose wherein the data received from the entity is an item data, supplier data or a data string extracted from at least one data source (paragraph[0050], “In an example embodiment of the invention, the classification of new received spend data is provided using Supplier Search Engine (SSE). The SSE uses past (pre-classified) data as a reference data and the new received spend data is used to search from the reference data for the category code…etc.”).

Claim 23
As to claim 23, the combination of Vachhani, Gunn, and Weber discloses all the elements in claim 22, as noted above, and Vachhani further disclose wherein the dynamic processing logic integrates deep learning, predictive analysis, information extraction, optimization and bots for processing the at least one data subset (paragraph[0064], “Based on maximum score value, a matching spend data is selected from the reference data and the category code of the reference data is returned as classification for the received spend data wherein the score value is the confidence value for the classification. Vendor name has the most significance in spend value, so the first search logic ‘Full Search’ is invoked which filters those records where the vendor name was present in the reference data. The search is then executed only in the selected subset of reference data. If the vendor name of new spend data is not present in reference data, then the second search logic ‘Partial Search’ is invoked which searches the current record in the entire repository of reference data. The results after executing ‘Full Search’ and ‘Partial Search’ are returned along with the confidence score…etc.”).



Claim 29
As to claim 29, the combination of Vachhani, Gunn, and Weber discloses all the elements in claim 25, as noted above, and Vachhani further disclose wherein the data received from the entity is a supplier data, item data or a data string extracted from at least one data source (paragraph[0050], “In an example embodiment of the invention, the classification of new received spend data is provided using Supplier Search Engine (SSE). The SSE uses past (pre-classified) data as a reference data and the new received spend data is used to search from the reference data for the category code…etc.”).

Final Rejection
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAWAUNE A CONYERS whose telephone number is (571)270-3552.  The examiner can normally be reached on M-F 8:00am-4:30pm EST. EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Neveen Abel-Jalil can be reached on (571) 270-0474.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/DAWAUNE A CONYERS/Primary Examiner, Art Unit 2152                                                                                                                                                                                                        November 26, 2022