Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In respect to remark filed on 10/26/2021, the Applicant canceled non-elected claims 8-20.  Therefore, the claims 1-7 are pending.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-3, and 6 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Szczepanik et al. (U.S. Pub. 2020/0174966 A1)
With respect to claim 1, Szczepanik et al. discloses a method to facilitate data monitoring in a computing system, the method comprising:
ingesting unprocessed data from one or more data input streams (i.e., “the files ingested into the data lake system 101 and being stored by the raw data storage 117, may be managed using a flat file architecture to track and maintain each of the files without having to apply a structure or schema to the raw data storage 117 at the time of file ingestion” (0064) and “The data of each file may be transmitted to the data lake systems 101 in discrete data packets or by streaming the file data over network 150 and storing the streaming files to the raw data storage 117” (0065)); 
generating metadata using the unprocessed data ingested from the one or more data input streams (i.e.,. “More specifically, embodiments of the data lake system 101 operating within a computing environment 100, 180, 190, 200, 280, 350, may perform the functions associated with ingesting at least one streaming file from one or more data streams 203, 205, 207 into the data lake, storing the streaming files in an unprocessed, native format, scanning metadata associated with the streaming files (either embedded within the file or as a separate metadata file), analyzing the metadata, categorizing the content of the streaming files using the metadata and one or more machine learning techniques, and generating a list of the files entering or being stored by the data lake system 101” (0061) and “each file being stored by the raw data storage 117 can be assigned a unique identifier. In some instances, each file entering the data lake system 101 may also be tagged with a set of metadata tags further describing the type of data being stored to the raw data storage 117 as well as the content of the file being ingested”(0064)); 
computing, by utilizing the metadata, one or more expected data outputs from the unprocessed data (i.e., “Embodiments of the reasoning engine may rank the records of the past data lakes based upon how closely the categorization of data matches with the current data lake, the expected performance of the database engine 119 and/or the frequency of using the categorized data identified in step 509” (0112) or step 507 to generate a file list identifying each file); 
ingesting processed data from one or more data output streams, wherein the processed data includes one or more actual data outputs (i.e., “The algorithm may learn by comparing the actual output with the correct outputs in order to find errors. The machine learning module 114 may modify the model of data according to the correct outputs to refine the decision making of the machine learning module 114, improving the accuracy of the automated decision making of the machine learning module 114 to provide the correct inputs”(0086)); 
determining that the one or more actual data outputs does not align with the one or more expected data outputs ((i.e., “The algorithm may learn by comparing the actual output with the correct outputs in order to find errors. The machine learning module 114 may modify the model of data according to the correct outputs to refine the decision making of the machine learning module 114, improving the accuracy of the automated decision making of the machine learning module 114 to provide the correct inputs” (0086) or fig. 5C show step 559 if existing data lake identifier, no mean does not align as claimed invention); 
generating an alert signifying that the one or more expected data outputs does not align with the one or more actual data outputs((i.e., “The algorithm may learn by comparing the actual output with the correct outputs in order to find errors. The machine learning module 114 may modify the model of data according to the correct outputs to refine the decision making of the machine learning module 114, improving the accuracy of the automated decision making of the machine learning module 114 to provide the correct inputs” (0086); and 
sending the alert to a client (i.e., “the reporting engine 125 of the data lake system 101 may send a report, notification or error alerting a user or administrator of the data lake system 101 that an operational database 123 or database engine 119 could not be found that matches the management requirements of the file types or data categories being received or stored” (0093)).  
With respect to claim 2, Szczepanik et al. discloses wherein generating the alert further comprises generating a visual error report, wherein the visual error report highlights which of the one or more actual data outputs does not align with the one or more expected data outputs (i.e., “the reporting engine 125 of the data lake system 101 may send a report, notification or error alerting a user or administrator of the data lake system 101 that an operational database 123 or database engine 119 could not be found that matches the management requirements of the file types or data categories being received or stored. In some embodiments, the data lake system 101 may request human input from the user of administrator to resolve the error in identifying a suitable operational database 123 or database engine 119” (0093)).  
With respect to claim 3, Szczepanik et al. discloses wherein generating the metadata further comprises: determining a value distribution of the unprocessed data i.e., “More specifically, embodiments of the data lake system 101 operating within a computing environment 100, 180, 190, 200, 280, 350, may perform the functions associated with ingesting at least one streaming file from one or more data streams 203, 205, 207 into the data lake, storing the streaming files in an unprocessed, native format, scanning metadata associated with the streaming files (either embedded within the file or as a separate metadata file), analyzing the metadata, categorizing the content of the streaming files using the metadata and one or more machine learning techniques, and generating a list of the files entering or being stored by the data lake system 101” (0061)); checking data types of the unprocessed data (i.e. “each file entering the data lake system 101 may also be tagged with a set of metadata tags further describing the type of data being stored to the raw data storage 117 as well as the content of the file being ingested”(0064)); and identifying a data schema for the unprocessed data (i.e., “While some of the unstructured data may have some internal structure, files may still be considered unstructured because the data contained by the file may not fit neatly into a database ..semi-structured data" on the other hand may be a type of data that contains semantic tags but may not conform to the structure associated with standard databases and may have a lack of a rigid schema found in structured data” (0067) or “files are accessed within the historical data lake, and one or more associated operational databases 123 comprising database engines 119 capable of applying a particular schema to the files of the raw data storage 117.’(0071)).   
With respect to claim 6, Szczepanik et al. discloses further comprising generating a metadata confidence level wherein the metadata confidence level indicates at least an accuracy of the metadata (i.e., “The statistical analysis of the different types of data or data categorizations may be compared to the types of data being stored with historically provisioned data lakes, in order to calculate at a level of confidence (confidence interval) that one of the historically provisioned data lakes, more likely than not, has been provisioned with one or more operational databases 123 that successfully managed or organized the same categories of data stored and/or streamed to the newly registered data lake. For example, the most closely matched historical data lake may be considered the closest match within a 99% confidence interval (CI), a 95% CI, 90% CI, 85% CI, 75% CI, etc.” (0113)).  
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 4-5 are rejected under 35 U.S.C 103(a) as being unpatentable over Szczepanik et al. (U.S. Pub. 2020/0174966 A1) in view of Goldentouch  (U.S. Pub. 2011/0082848 A1)
With respect to claim 4, Szczepanik et al. discloses all limitation recited in the claim 3, However, Goldentouch discloses further comprising tracking format changes to the unprocessed data and notifying the client of the format changes to the unprocessed data (i.e., “Monitor and send alerts. The user may activate alert mechanism, which sends alerts upon some preset events, including changes of selected objects, new search results, analysis committed by other group member or other suitable events” (0250) and “Get search results, including monitoring the search provider, verifying that a set of results is returned, analyzing the format of the set of results, getting the raw results or performing similar operations” (0263)).  It would have been obvious for a person of ordinary skill in the art, before the effective filing date of the claimed invention, to include Goldentouch’s feature in order to get accurate the expect result 
With respect to claim 5, Goldentouch discloses the method of claim 4, further comprising detecting, in real time, changes to object records of the unprocessed data and notifying the client of changes to the object records of the unprocessed data (i.e., “Monitor and send alerts. The user may activate alert mechanism, which sends alerts upon some preset events, including changes of selected objects, new search results, analysis committed by other group member or other suitable events” (0250)).
Claims 7 is reject under 35 U.S.C 103(a) as being unpatentable over Szczepanik et al. (U.S. Pub. 2020/0174966 A1) in view of Colley et al.  (U.S. Pub. 2021/00906994 A1)
With respect to claim 7, Szczepanik et al. discloses all limitation recited in claim 1 except for wherein the data output stream originates from an extract/transform/load (ETL) orchestrated environment.  However, Colley et al. discloses wherein the data output stream originates from an extract/transform/load (ETL) orchestrated environment (i.e. “respectively, an event reporting bus 316, system micro-services 186, various data lake APIs 332, 334 and 336, an ETL module 338, data lake query and analytics modules 346 and 348, respectively, an ETL platform 360 as well as data marts database 190” (0994), 0998).  It would have been obvious for a person of ordinary skill in the art, before the effective filing date of the claimed invention, to include Colley et al.’s feature in order to have optimized model format for replicating the output data to different system for the stated purpose has been well known in the art as evidenced by teaching of Coolley et al. (0998).
Close reference:
U.S. 2019/0028557 teaches minoring ingesting data, generating metadata (006-009)
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUNG T VY whose telephone number is (571)272-1954. The examiner can normally be reached M-F 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tony Mahmoudi can be reached on (571)272-4078. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HUNG T VY/Primary Examiner, Art Unit 2163                                                                                                                                                                                             November 6, 2021