DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Remarks 

Receipt of Applicant’s Amendment file on 12/28/2021 is acknowledged. The amendment includes claims 1-20 are amended.
Response to Arguments
Applicant's arguments filed 12/28/2021 have been fully considered but they are not persuasive. 
Regarding claim 1, applicant argues that cited references do disclose “ “accessing...one or more current datasets used by a first live database at a production datacenter and a second live database at a non-production datacenter” and “retrieving...an experimental dataset from an experimental database at the non-production datacenter” (page 10, 2nd paragraph). Applicant further argues “None of the references cited in rejection of the independent claims (Cantwell, Simca and Chen) discloses the claimed datasets used by a live database at a non-production datacenter and the claimed experimental dataset from an experimental database at the non-production datacenter.” (page 10, 2nd paragraph). Respectfully, it is noted that Cantwell teaches block server are coupled to storage, which stores volume data of client; block server and slice server maintain a mapping between block identifier and location of the data block in a storage medium of block server; metadata data maps between client addressing used by client and block identifier; retrieving metadata includes a list of block identifiers associated with data blocks of the volume, paragraph [0018], [0024], [0029], [0030]-[0031], which read on accessing, by a data monitoring system, one or more current datasets used by a first live database at a production datacenter, wherein the first live database uses the one or more current datasets to support a production version of a web service for client use. Cantwell further teaches one or more backup servers are coupled to storage, which stores backups of volume data for client; retrieving the current metadata from metadata server 110 for a client volume; the backup server then compare the current metadata server with a version of stored metadata on backup server, paragraph [0028],  [0031], which read on a second live database at a non-production datacenter. The newly cited reference Breck teaches pipeline that trains on new training data arriving in batch from serving data, noted, training data is interpreted as current data sets to perform analytic on the production version of the web service, page 1, right column, Example 1.1, page 2, left column, first paragraph, which read on wherein the second live database uses the one or more current datasets to perform analytics on the production version of the web service. Breck further teach pipeline that trains on new training data arriving in batch from serving data; using system to validate trillions of training and serving examples per day, amounting to several petabytes of data per day, page 1, right column, Example 1.1, page 2, left column, first paragraph and right column 3rd paragraph, which read on retrieving, by the data monitoring system, an experimental dataset from an experimental database at the non-production datacenter. Therefore, the combination of references teach the limitations.
	Claim 10 and 17 are rejected for similar reasons.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 4, 10 are rejected under 35 U.S.C. 103 as being unpatentable over Cantwell et al. (U.S. Pub. No. 2015/0244795 A1) and Breck et al. (“Data Validation for Machine Learning”; Proceedings of the 2nd SysML Conference, Palo Alto, CA, USA,22019 Copyright 2019 by the author(s)).
Regarding claim 1, Cantwell teaches a method, comprising: 
accessing, by a data monitoring system, one or more current datasets used by a first live database at a production datacenter, wherein the first live database uses the one or more current datasets to support a production version of a web service for client use (paragraph [0018], [0024], [0029], [0030]-[0031], block server are coupled to storage, which stores volume data of client; block server and slice server maintain a mapping between block identifier and location of the data block in a storage medium of block server; metadata data maps between client addressing used by client and block identifier; retrieving metadata includes a list of block identifiers associated with data blocks of the volume), and a second live database at a non-production datacenter (paragraph [0028], [0031],one or more backup servers are coupled to storage, which stores backups of volume data for client; retrieving the current metadata from metadata server 110 for a client volume; the backup server then compare the current metadata server with a version of stored metadata on backup server);
performing, by the data monitoring system, encoding operations on the one or more current datasets to generate encode values corresponding to the one or more current datasets ([0031],one or more backup servers are coupled to storage, which stores backups of volume data for client; retrieving the current metadata from metadata server 110 for a client volume; the block identifier are hashed based on content of a corresponding data block, noted, hashing based on content is interpreted as encoding operation).
Cantwell does not explicitly disclose: wherein the second live database uses the one or more current datasets to perform analytics on the production version of the web service; 
retrieving, by the data monitoring system, an experimental dataset from an experimental database at the non-production datacenter;
performing, by the data monitoring system, validation operations on the experimental dataset, wherein the validation operations include: retrieving the encode values corresponding to the one or more current datasets; and using the encode values to validate one or more characteristics of the experimental dataset; and in response to a determination of success of the validation operations, generating, by the data monitoring system, a validation output indicating that the experimental dataset should be published to the first and second live databases.
Breck teaches: wherein the second live database uses the one or more current datasets to perform analytics on the production version of the web service (page 1, right column, Example 1.1, page 2, left column, first paragraph, pipeline that trains on new training data arriving in batch from serving data, noted, training data is interpreted as current data sets to perform analytic on the production version of the web service); 
retrieving, by the data monitoring system, an experimental dataset from an experimental database at the non-production datacenter (page 1, right column, Example 1.1, page 2, left column, first paragraph and right column 3rd paragraph, pipeline that trains on new training data arriving in batch from serving data; using system to validate trillions of training and serving examples per day, amounting to several petabytes of data per day);
performing, by the data monitoring system, validation operations on the experimental dataset, wherein the validation operations include: retrieving the encode values corresponding to the one or more current datasets; and using the encode values to validate one or more characteristics of the experimental dataset (Breck, page 2, right column, page 4, right column, page 5, the schema follows a logical data model where each training or serving is a collection of features, with each feature having several constraints attached to it; the schema encodes data properties are unique in ML; performing validation each batch data by comparing it against the schema; including suggested updates to schema to eliminate anomalies that correspond to the natural evolution of the data); and in response to a determination of success of the validation operations, generating, by the data monitoring system, a validation output indicating that the experimental dataset should be published to the first and second live databases (page 5, right column, the data validator will recommend update to schema as new data is ingested and analyzed; system includes interface and tools that aid users by directing their attention to important suggestions and providing a click button to apply the suggested changed).
It would have been obvious to one of ordinary skill in art before the effective filing date of the claim invention to include wherein the second live database uses the one or more current datasets to perform analytics on the production version of the web service;  retrieving, by the data monitoring system, an experimental dataset from an experimental database at the non-production datacenter; performing, by the data monitoring system, validation operations on the experimental dataset, wherein the validation operations include: retrieving the encode values corresponding to the one or more current datasets; and using the encode values to validate one or more characteristics of the experimental dataset; and in response to a determination of success of the validation operations, generating, by the data monitoring system, a validation output indicating that the experimental dataset should be published to the first and second live databases into data syncing in a distributed system of Cantwell.
Motivation to do so would be to include wherein the second live database uses the one or more current datasets to perform analytics on the production version of the web service;  retrieving, by the data monitoring system, an experimental dataset from an experimental database at the non-production datacenter; performing, by the data monitoring system, validation operations on the experimental dataset, wherein the validation operations include: retrieving the encode values corresponding to the one or more current datasets; and using the encode values to validate one or more characteristics of the experimental dataset; and in response to a determination of success of the validation operations, generating, by the data monitoring system, a validation output indicating that the experimental dataset should be published to the first and second live databases to address issue that may result in a small drop in model quality, one that can be easily missed if the data errors affect the model only on specific slices of the data but the aggregate model metrics still look okay (Breck, page 2, right column, 2nd paragraph, 8-11).
Regarding claim 4, Cantwell as modified by Breck teach all claimed limitations as set forth in rejection of claim 1, further teach wherein the performing the validation operations includes validating a schema associated with the experimental dataset (Breck, page 2, right column, page 4, right column, page 5, the schema follows a logical data model where each training or serving is a collection of features, with each feature having several constraints attached to it; the schema encodes data properties are unique in ML; performing validation each batch data by comparing it against the schema; including suggested updates to schema to eliminate anomalies that correspond to the natural evolution of the data).
Regarding claim 10, Cantwell teaches non-transitory, computer-readable medium having instructions stored thereon that are executable by a data monitoring system to perform operations (paragraph [0045]) comprising: 
accessing, by a data monitoring system, one or more current datasets used by a first live database at a production datacenter, wherein the first live database uses the one or more current datasets to support a production version of a web service for client use (paragraph [0018], [0024], [0029], [0030]-[0031], block server are coupled to storage, which stores volume data of client; block server and slice server maintain a mapping between block identifier and location of the data block in a storage medium of block server; metadata data maps between client addressing used by client and block identifier; retrieving metadata includes a list of block identifiers associated with data blocks of the volume), and a second live database at a non-production datacenter (paragraph [0028], [0031],one or more backup servers are coupled to storage, which stores backups of volume data for client; retrieving the current metadata from metadata server 110 for a client volume; the backup server then compare the current metadata server with a version of stored metadata on backup server);
performing, by the data monitoring system, encoding operations on the one or more current datasets to generate encode values corresponding to the one or more current datasets ([0031],one or more backup servers are coupled to storage, which stores backups of volume data for client; retrieving the current metadata from metadata server 110 for a client volume; the block identifier are hashed based on content of a corresponding data block, noted, hashing based on content is interpreted as encoding operation).
Cantwell does not explicitly disclose: wherein the second live database uses the one or more current datasets to perform analytics on the production version of the web service; 
retrieving, by the data monitoring system, an experimental dataset from an experimental database at the non-production datacenter;
performing, by the data monitoring system, validation operations on the experimental dataset, wherein the validation operations include: retrieving the encode values corresponding to the one or more current datasets; and using the encode values to validate one or more characteristics of the experimental dataset; and in response to a determination of success of the validation operations, generating, by the data monitoring system, a validation output indicating that the experimental dataset should be published to the first and second live databases.
Breck teaches: wherein the second live database uses the one or more current datasets to perform analytics on the production version of the web service (page 1, right column, Example 1.1, page 2, left column, first paragraph, pipeline that trains on new training data arriving in batch from serving data, noted, training data is interpreted as current data sets to perform analytic on the production version of the web service); 
retrieving, by the data monitoring system, an experimental dataset from an experimental database at the non-production datacenter (page 1, right column, Example 1.1, page 2, left column, first paragraph and right column 3rd paragraph, pipeline that trains on new training data arriving in batch from serving data; using system to validate trillions of training and serving examples per day, amounting to several petabytes of data per day);
performing, by the data monitoring system, validation operations on the experimental dataset, wherein the validation operations include: retrieving the encode values corresponding to the one or more current datasets; and using the encode values to validate one or more characteristics of the experimental dataset (Breck, page 2, right column, page 4, right column, page 5, the schema follows a logical data model where each training or serving is a collection of features, with each feature having several constraints attached to it; the schema encodes data properties are unique in ML; performing validation each batch data by comparing it against the schema; including suggested updates to schema to eliminate anomalies that correspond to the natural evolution of the data); and in response to a determination of success of the validation operations, generating, by the data monitoring system, a validation output indicating that the experimental dataset should be published to the first and second live databases (page 5, right column, the data validator will recommend update to schema as new data is ingested and analyzed; system includes interface and tools that aid users by directing their attention to important suggestions and providing a click button to apply the suggested changed).
It would have been obvious to one of ordinary skill in art before the effective filing date of the claim invention to include wherein the second live database uses the one or more current datasets to perform analytics on the production version of the web service;  retrieving, by the data monitoring system, an experimental dataset from an experimental database at the non-production datacenter; performing, by the data monitoring system, validation operations on the experimental dataset, wherein the validation operations include: retrieving the encode values corresponding to the one or more current datasets; and using the encode values to validate one or more characteristics of the experimental dataset; and in response to a determination of success of the validation operations, generating, by the data monitoring system, a validation output indicating that the experimental dataset should be published to the first and second live databases into data syncing in a distributed system of Cantwell.
Motivation to do so would be to include wherein the second live database uses the one or more current datasets to perform analytics on the production version of the web service;  retrieving, by the data monitoring system, an experimental dataset from an experimental database at the non-production datacenter; performing, by the data monitoring system, validation operations on the experimental dataset, wherein the validation operations include: retrieving the encode values corresponding to the one or more current datasets; and using the encode values to validate one or more characteristics of the experimental dataset; and in response to a determination of success of the validation operations, generating, by the data monitoring system, a validation output indicating that the experimental dataset should be published to the first and second live databases to address issue that may result in a small drop in model quality, one that can be easily missed if the data errors affect the model only on specific slices of the data but the aggregate model metrics still look okay (Breck, page 2, right column, 2nd paragraph, 8-11).
Claims 7-9 are rejected under 35 U.S.C. 103 as being unpatentable over Cantwell et al. (U.S. Pub. No. 2015/0244795 A1) and Breck et al. (“Data Validation for Machine Learning”; Proceedings of the 2nd SysML Conference, Palo Alto, CA, USA,22019 Copyright 2019 by the author(s)), further in view of Simca et al. (U.S. Patent No. 10,642,715 B1).
Regarding claim 7, Cantwell as modified by Breck teach all claimed limitations as set forth in rejection of claim 1, but do not explicitly disclose: wherein the performing the validation operations includes validating an update pattern associated with one or more data records in the experimental dataset. 
Simca teaches: wherein the performing the validation operations includes validating an update pattern associated with one or more data records in the experimental dataset (Simca teaches testing data environment may maintain actual historical data, artificial testing data or simulated testing data which may be useful in learning about the processes and their attributes, col. 5, line 56-64, obtaining process information from testing data environment, col. 7, line 41-42; such activity as modifying a file, downloading a file, etc., col. 17, line 6-20; testing data environment may include historical data associated with processing running in live data environment, statistics parameters  and dynamic parameter; dynamic parameter may include: creating, modifying, or deleting data or file, col. 6, line 15-19, 31-32 and 60; each process may be stored with a reference or identifier (e.g., file name, keyword, numerical identifier, hash, category, pointer, etc.), col. 5, line 59-63; looking for pattern of known, valid activity (or know, invalid activity) and define steps in each process as part of each context profile, col. 8, line 18-25).
It would have been obvious to one of ordinary skill in art before the effective filing date of the claim invention to include wherein the performing the validation operations includes validating an update pattern associated with one or more data records in the experimental dataset into syncing in a distributed system of Cantwell.
Motivation to do so would be to include wherein the performing the validation operations includes validating an update pattern associated with one or more data records in the experimental dataset to provide adaptive, customized, and flexible security in networks with dynamically changing software applications (Simca, col. 1, line 63-64).
Regarding claim 8, Cantwell as modified by Breck and Simca teach all claimed limitations as set forth in rejection of claim 7, further teach wherein the updated dataset is an updated version of a first dataset, and wherein the plurality of datasets includes a historical version of the first dataset (Simca teaches obtaining process information from testing data environment, col. 7, line 41-42; such activity as modifying a file, downloading a file, etc., col. 17, line 6-20; testing data environment may include historical data associated with processing running in live data environment, statistics parameters  and dynamic parameter; dynamic parameter may include: creating, modifying, or deleting data or file, col. 6, line 15-19, 31-32 and 60; each process may be stored with a reference or identifier (e.g., file name, keyword, numerical identifier, hash, category, pointer, etc.), col. 5, line 59-63); and wherein the performing the encoding operations includes encoding the historical version of the first dataset to generate update pattern encode values associated with the first dataset (Cantwell, paragraph [0031], one or more backup servers are coupled to storage, which stores backups of volume data for client; retrieving the current metadata from metadata server 110 for a client volume; the block identifier are hashed based on content of a corresponding data block, comparing the current metadata from metadata server with a version of stored metadata on backup server, noted, hashing based on content is interpreted as encoding operation and version on backup server is interpreted as historical version of the first data set). 
Regarding claim 9, Cantwell as modified by Breck and Simca teach all claimed limitations as set forth in rejection of claim 8, further teach wherein the validating the update pattern includes comparing the one or more data records in the experimental dataset to the update pattern encode values associated with the first dataset (Breck, page 2, right column, page 4, right column, page 5, the schema follows a logical data model where each training or serving is a collection of features, with each feature having several constraints attached to it; the schema encodes data properties are unique in ML; performing validation each batch data by comparing it against the schema). 
Claims 2-3, 5-6 and 11-14 are rejected under 35 U.S.C. 103 as being unpatentable over Cantwell et al. (U.S. Pub. No. 2015/0244795 A1) and Breck et al. (“Data Validation for Machine Learning”; Proceedings of the 2nd SysML Conference, Palo Alto, CA, USA,22019 Copyright 2019 by the author(s)), further in view of Naphade et al. (U.S. Pub. No. 2020/0410322 A1).
Regarding claim 2, Cantwell as modified by Breck teach all claimed limitations as set forth in rejection of claim 1, but do not explicitly disclose: wherein the encoding operations include: training an autoencoder machine learning model based on one or more current datasets to generate a trained autoencoder.
Naphade teaches: wherein the encoding operations include: training an autoencoder machine learning model based on one or more current datasets to generate a trained autoencoder (learning and applying data encoding in unsupervised manner using  input data; generating data points deriving from a mixture of Gaussian distributions, probabilistic model provides a notification or indication of an anomaly in input data because from being trained, probabilistic model has learned which events are consider anomalous (e.g., different from normal data), paragraph [0022]-[0024], [0030], noted, the normal data which has been trained previously, which interpreted as the plurality of datasets).
It would have been obvious to one of ordinary skill in art before the effective filing date of the claim invention to include wherein the encoding operations include: training an autoencoder machine learning model based on one or more current datasets to generate a trained autoencoder into syncing in a distributed system of Cantwell.
Motivation to do so would be to include wherein the encoding operations include: training an autoencoder machine learning model based on one or more current datasets to generate a trained autoencoder to provide ability to learn normal event behavior all in one network (Naphade, paragraph [0013], line 6).
Regarding claim 3, Cantwell as modified by Breck and Naphade teach all claimed limitations as set forth in rejection of claim 2, further teach wherein the validation operations further include: applying the trained autoencoder to the experimental dataset to detect one or more anomalous data records in the experimental dataset (Naphade teaches learning and applying data encoding in unsupervised manner using  input data; generating data points deriving from a mixture of Gaussian distributions, probabilistic model provides a notification or indication of an anomaly in input data because from being trained, probabilistic model has learned which events are consider anomalous (e.g., different from normal data), paragraph [0022]-[0024], [0030]; noted, the normal data which has been trained previously; in conjunction with the training data taught by Breck, it teaches wherein the validation operations further include: applying the trained autoencoder to the experimental dataset to detect one or more anomalous data records in the experimental dataset as claimed). 
Regarding claim 5, Cantwell as modified by Breck teach all claimed limitations as set forth in rejection of claim 4, but do not explicitly disclose: wherein the performing the encoding operations includes training an autoencoder machine learning model using one or more current datasets.
Naphade teaches: wherein the performing the encoding operations includes training an autoencoder machine learning model using one or more current datasets (learning and applying data encoding in unsupervised manner using  input data; generating data points deriving from a mixture of Gaussian distributions, probabilistic model provides a notification or indication of an anomaly in input data because from being trained, probabilistic model has learned which events are consider anomalous (e.g., different from normal data), paragraph [0022]-[0024], [0030]).
It would have been obvious to one of ordinary skill in art before the effective filing date of the claim invention to include wherein the performing the encoding operations includes training an autoencoder machine learning model using one or more current datasets into syncing in a distributed system of Cantwell.
Motivation to do so would be to include wherein the performing the encoding operations includes training an autoencoder machine learning model using one or more current datasets to provide ability to learn normal event behavior all in one network (Naphade, paragraph [0013], line 6).
Cantwell as modified by Breck and Naphade further teach: wherein the encode values include a schema encode value that indicates one or more baseline attributes that correspond to schemas of the plurality of datasets (Breck, page 1, right column, Example 1.1, page 2, left column, first paragraph, page 2, right column, page 4, right column, page 5, pipeline that trains on new training data arriving in batch from serving data; the schema follows a logical data model where each training or serving is a collection of features, with each feature having several constraints attached to it; the schema encodes data properties are unique in ML; performing validation each batch data by comparing it against the schema; including suggested updates to schema to eliminate anomalies that correspond to the natural evolution of the data).
Regarding claim 6, Cantwell as modified by Breck and Naphade teach all claimed limitations as set forth in rejection of claim 5, further teach wherein the validating the schema associated with the updated dataset includes: identifying one or more attributes associated with the schema of the updated dataset; and comparing the one or more attributes associated with the schema of the updated dataset to the one or more baseline attributes associated with the schemas of the plurality of datasets (Breck, page 1, right column, Example 1.1, page 2, left column, first paragraph, page 2, right column, page 4, right column, page 5, pipeline that trains on new training data arriving in batch from serving data; the schema follows a logical data model where each training or serving is a collection of features, with each feature having several constraints attached to it; the schema encodes data properties are unique in ML; performing validation each batch data by comparing it against the schema; including suggested updates to schema to eliminate anomalies that correspond to the natural evolution of the data). 
Regarding claim 11, Cantwell as modified by Breck teach all claimed limitations as set forth in rejection of claim 10, but do not explicitly disclose: wherein the performing the validation operations includes validating a value distribution associated with the experimental dataset. 
Naphade teaches: wherein the performing the validation operations includes validating a value distribution associated with the experimental dataset (learning and applying data encoding in unsupervised manner using  input data; generating data points deriving from a mixture of Gaussian distributions, probabilistic model provides a notification or indication of an anomaly in input data because from being trained, probabilistic model has learned which events are consider anomalous (e.g., different from normal data), paragraph [0022]-[0024], [0030]).
It would have been obvious to one of ordinary skill in art before the effective filing date of the claim invention to include wherein the performing the validation operations includes validating a value distribution associated with the experimental dataset into syncing in a distributed system of Cantwell.
Motivation to do so would be to include wherein the performing the validation operations includes validating a value distribution associated with the experimental dataset to provide ability to learn normal event behavior all in one network (Naphade, paragraph [0013], line 6).
Regarding claim 12, Cantwell as modified by Breck and Naphade teach all claimed limitations as set forth in rejection of claim 11, further teach wherein the performing the encoding operations includes: training an autoencoder machine learning model based on the one or more current datasets to generate a trained autoencoder model; and calculating a first latent probability distribution corresponding to the one or more current datasets using the trained autoencoder model (Naphade learning and applying data encoding in unsupervised manner using  input data, paragraph [0018]- [0021]; generating data points deriving from a mixture of Gaussian distributions, probabilistic model provides a notification or indication of an anomaly in input data because from being trained, probabilistic model has learned which events are consider anomalous (e.g., different from normal data), paragraph [0022]-[0024], [0030]). 
Regarding claim 13, Cantwell as modified by Breck and Naphade teach all claimed limitations as set forth in rejection of claim 12, further teach wherein the autoencoder machine learning model is a Deep Autoencoding Gaussian Mixture Model (DAGMM) (Naphade, paragraph [0013], [0027]-[0030], deep autoencoder with latent space modeling of Gaussian Mixtures is interpreted as Deep Autoencoding Gaussian Mixture Model (DAGMM)). 
Regarding claim 14, Cantwell as modified by Breck and Naphade teach all claimed limitations as set forth in rejection of claim 12, further teach wherein the validating the value distribution associated with the updated dataset includes validating numerical data in the updated dataset, including by: applying the trained autoencoder model to the experimental dataset to calculate a second latent probability distribution corresponding to the experimental dataset; and comparing the first and second latent probability distributions (Naphade teaches learning and applying data encoding in unsupervised manner using  input data, paragraph [0018]- [0021]; generating data points deriving from a mixture of Gaussian distributions, probabilistic model provides a notification or indication of an anomaly in input data because from being trained, probabilistic model has learned which events are consider anomalous (e.g., different from normal data), paragraph [0022]-[0024], [0030]). 
Claims 15 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Cantwell et al. (U.S. Pub. No. 2015/0244795 A1) in view of Breck et al. (“Data Validation for Machine Learning”; Proceedings of the 2nd SysML Conference, Palo Alto, CA, USA,22019 Copyright 2019 by the author(s)), further in view of Bhatnagar et al. (U.S. Pub. No. 2020/0104587 A1).
Regarding claim 15, Cantwell as modified by Breck teach all claimed limitations as set forth in rejection of claim 10, but do not explicitly disclose wherein the performing the validation operations includes validating a value format of string-type data included in the experimental dataset. 
Bhatnagar teaches wherein the performing the validation operations includes validating a value format of string-type data included in the experimental dataset (receiving structured data from daily and historic invoices; extracting key data fields from the invoices using the regular expression; validating extracted string, paragraph [0022], [0025]-[0027], [0032]).
It would have been obvious to one of ordinary skill in art before the effective filing date of the claim invention to include wherein the performing the validation operations includes validating a value format of string-type data included in the experimental dataset into syncing in a distributed system of Cantwell.
Motivation to do so would be to include wherein the performing the validation operations includes validating a value format of string-type data included in the experimental dataset to overcome issue with manually examining such variations and volumes of invoices for correctness, genuineness, and duplicates against historic invoices is usually highly subjective and increases the average cost and time to process an invoice (Bhatnagar, paragraph [0003], line 9-13).
Regarding claim 16, Cantwell as modified by Breck and Bhatnagar teach all claimed limitations as set forth in rejection of claim 15, further teach wherein the performing the encoding operations includes: generating one or more regular expressions based on string-type data included in at least one of the one or more current datasets; and wherein the validating the value format of string-type data included in the experimental dataset includes parsing data in the experimental dataset using the one or more regular expressions (Bhatnagar teaches receiving structured data from daily and historic invoices; extracting key data fields from the invoices using the regular expression; validating extracted string; comparing corresponding fields across two invoices for similarity measures, paragraph [0022], [0025]-[0027], [0032]-[0034]). 
Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Breck et al. (“Data Validation for Machine Learning”; Proceedings of the 2nd SysML Conference, Palo Alto, CA, USA,22019 Copyright 2019 by the author(s)) in view of Cantwell et al. (U.S. Pub. No. 2015/0244795 A1).
Regarding claim 17, Breck teaches a method, comprising: 
performing validation operations on an experimental dataset from an experimental database at a non-production datacenter (page 1, right column, Example 1.1, page 2, left column, first paragraph and right column 3rd paragraph, page 4, right column, page 5, pipeline that trains on new training data arriving in batch from serving data; using system to validate trillions of training and serving examples per day, amounting to several petabytes of data per day; the schema follows a logical data model where each training or serving is a collection of features, with each feature having several constraints attached to it; the schema encodes data properties are unique in ML; performing validation each batch data by comparing it against the schema; including suggested updates to schema to eliminate anomalies that correspond to the natural evolution of the data), wherein the validation operations include:
retrieving encode values corresponding to one or more current datasets, wherein the second live database uses the one or more current datasets to perform analytics on the production version of the web service (page 1, right column, Example 1.1, page 2, left column, first paragraph, page 2, right column, page 4, right column, page 5, pipeline that trains on new training data arriving in batch from serving data; noted, training data is interpreted as current data sets to perform analytic on the production version of the web service; the schema follows a logical data model where each training or serving is a collection of features, with each feature having several constraints attached to it; the schema encodes data properties are unique in ML; performing validation each batch data by comparing it against the schema; including suggested updates to schema to eliminate anomalies that correspond to the natural evolution of the data); and using the encode values to validate one or more characteristics of the experimental dataset (page 2, right column, page 4, right column, page 5, the schema follows a logical data model where each training or serving is a collection of features, with each feature having several constraints attached to it; the schema encodes data properties are unique in ML; performing validation each batch data by comparing it against the schema; including suggested updates to schema to eliminate anomalies that correspond to the natural evolution of the data).
Breck does not explicitly disclose: wherein the one or more current datasets are maintained in used by a first live database at a production datacenter, wherein the first live database uses the one or more current datasets to support a production version of a web service for client use.
Cantwell teaches: wherein the one or more current datasets are maintained in used by a first live database at a production datacenter, wherein the first live database uses the one or more current datasets to support a production version of a web service for client use (paragraph [0018], [0024], [0029], [0030]-[0031], block server are coupled to storage, which stores volume data of client; block server and slice server maintain a mapping between block identifier and location of the data block in a storage medium of block server; metadata data maps between client addressing used by client and block identifier; retrieving metadata includes a list of block identifiers associated with data blocks of the volume).
It would have been obvious to one of ordinary skill in art before the effective filing date of the claim invention to include wherein the one or more current datasets are maintained in used by a first live database at a production datacenter, wherein the first live database uses the one or more current datasets to support a production version of a web service for client use in training environment of Breck.
Motivation to do so would be to include wherein the one or more current datasets are maintained in used by a first live database at a production datacenter, wherein the first live database uses the one or more current datasets to support a production version of a web service for client use for well suited to scale efficiently for long term archiving (Cantwell, paragraph [0003], line 14-15).
Breck as modified by Cantwell further teach: in response to a determination that the experimental dataset passes the validation operations, storing the experimental dataset in the first and second live databases (Breck, page 5, right column, the data validator will recommend update to schema as new data is ingested and analyzed; system includes interface and tools that aid users by directing their attention to important suggestions and providing a click button to apply the suggested changed; in conjunction with the teaching of Cantwell as analyzing the metadata determine the changes in block identifier, the backup may store received data block, paragraph [0045], it teaches in response to a determination that the experimental dataset passes the validation operations, storing the experimental dataset in the first and second live databases as claimed).
Claims 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Breck et al. (“Data Validation for Machine Learning”; Proceedings of the 2nd SysML Conference, Palo Alto, CA, USA,22019 Copyright 2019 by the author(s)) in view of Cantwell et al. (U.S. Pub. No. 2015/0244795 A1), further in view of Michelson et al. (U.S. Pub. No. 2020/0402672 A1).
Regarding claim 18, Breck as modified by Cantwell teach all claimed limitations as set forth in rejection of claim 17, but do not explicitly disclose: wherein the performing the validation operations includes validating semantic values associated with one or more data records in the experimental dataset. 
Michelson teaches: wherein the performing the validation operations includes validating semantic values associated with one or more data records in the experimental dataset (Fig. 3, paragraph [0054], [0058], determine the feature value similarity between attribute value of first set attributes 314 and second set of attribute 318; in conjunction with the comparison of batch data against the schema as taught by Breck, it teaches wherein the performing the validation operations includes validating semantic values associated with one or more data records in the updated dataset as claimed). 
It would have been obvious to one of ordinary skill in art before the effective filing date of the claim invention to include wherein the performing the validation operations includes validating semantic values associated with one or more data records in the experimental dataset in training environment of Breck.
Motivation to do so would be to include wherein the performing the validation operations includes validating semantic values associated with one or more data records in the experimental dataset to utilize feature values of grouping features of medical results (Michelson, paragraph [0005], line 2-3).
Regarding claim 19, Breck as modified by Cantwell and Michelson teach all claimed limitations as set forth in rejection of claim 18, further teach performing encoding operations using a natural language processing (NLP) model to calculate first vector word-embedding representations of data in the one or more current datasets; and wherein the validating the semantic values includes: using the NLP model to calculate second vector word-embedding representations of data in the experimental dataset; and comparing the first vector word-embedding and second vector word-embedding representations (Michelson, Fig. 3, paragraph [0035], [0043], [0054], [0058], using natural language to determine the feature value of group of values through word-embedding model; determine the feature value similarity between attribute value of first set attributes 314 and second set of attribute 318; in conjunction with the comparison of batch data against the schema as taught by Breck, it teaches performing encoding operations using a natural language processing (NLP) model to calculate first vector word-embedding representations of data in the plurality of datasets; and wherein the validating the semantic values includes: using the NLP model to calculate second vector word-embedding representations of data in the updated dataset; and comparing the first vector word-embedding and second vector word-embedding representations as claimed). 
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Breck et al. (“Data Validation for Machine Learning”; Proceedings of the 2nd SysML Conference, Palo Alto, CA, USA,22019 Copyright 2019 by the author(s)) in view of Cantwell et al. (U.S. Pub. No. 2015/0244795 A1), further in view of Simca et al. (U.S. Patent No. 10,642,715 B1).
Regarding claim 20, Breck as modified by Cantwell teach all claimed limitations as set forth in rejection of claim 17, but do not explicitly disclose: wherein the performing the validation operations includes validating an update pattern associated with one or more data records in the updated dataset. 
Simca teaches: wherein the performing the validation operations includes validating an update pattern associated with one or more data records in the updated dataset (Simca teaches testing data environment may maintain actual historical data, artificial testing data or simulated testing data which may be useful in learning about the processes and their attributes, col. 5, line 56-64, obtaining process information from testing data environment, col. 7, line 41-42; such activity as modifying a file, downloading a file, etc., col. 17, line 6-20; testing data environment may include historical data associated with processing running in live data environment, statistics parameters  and dynamic parameter; dynamic parameter may include: creating, modifying, or deleting data or file, col. 6, line 15-19, 31-32 and 60; each process may be stored with a reference or identifier (e.g., file name, keyword, numerical identifier, hash, category, pointer, etc.), col. 5, line 59-63; looking for pattern of known, valid activity (or know, invalid activity) and define steps in each process as part of each context profile, col. 8, line 18-25).
It would have been obvious to one of ordinary skill in art before the effective filing date of the claim invention to include wherein the performing the validation operations includes validating an update pattern associated with one or more data records in the updated dataset into syncing in a distributed system of Cantwell.
Motivation to do so would be to include wherein the performing the validation operations includes validating an update pattern associated with one or more data records in the updated dataset to provide adaptive, customized, and flexible security in networks with dynamically changing software applications (Simca, col. 1, line 63-64).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEN HOANG whose telephone number is (571)272-8401. The examiner can normally be reached M-F 7:30am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fred Ehichioya can be reached on (571) 272-4034. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/KEN HOANG/Examiner, Art Unit 2168      

/IRETE F EHICHIOYA/Supervisory Patent Examiner, Art Unit 2168