DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

In response to Applicant’s claims filed on May 03, 2021, claims 1-19 are now pending for examination in the application.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-7, 9-16, and 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mohan et al. (US Pub. No. 20160004742) in view of Cannaliato et al. (US Pub. No. 20170068715).

With respect to claim 1, Mohan et al. teaches a computer implemented method for providing fast data comparison for software testing, the computer implemented method comprising:

selecting a source database and a target database from one or more databases (“first data source, and compared with the next selected data source” See Paragraph 39);



assigning a unique key to each of the plurality of data-strings of each dataset (“the extracted value from the first source file can be hashed by the data harmonization system, and used as the key in a key value pair containing the key and the original row of data in the first source file as a string list” See Paragraph 39).  Mohan et al. does not disclose wherein the data is extracted directly into a file system of any available file location of a network server obviating need of any intermediate database.
However, Cannaliato et al. teaches extracting a source dataset and a target dataset respectively from the selected source database and the target database, each dataset comprising a plurality of data-strings, wherein the data is extracted directly into a file system of any available file location of a network server obviating need of any intermediate database (“ETL ( extract, transform, and load),” See Paragraph 17 and “extract fixed fields from byte strings in binary files,” See Paragraph 34 and “source specifications for the parser are required and include: Field name; Field data type ( String, Number, Date/Time, Object, Array),” See Paragraph 42);
generating a sequenced-file cache using the corresponding unique keys assigned to each of the plurality of data-strings, wherein the files are sequenced within the cache based on an available memory size (Values in the surrogate key dimensional column are unique sequential numbers used to insert a record into the platform and need to be defined if the enrichment cache is set to Add Record to DB, See Paragraph 53);


reducing incrementally, size of the extracted source datasets and target datasets, to perform optimized data-comparison by eliminating any repetition in data-read and data comparison cycles (“If a condition occurs such that the allocated ingest nodes cannot work the messages off the queue at a sufficient rate, the queue size increases. When this happens, the platform automatically starts additional ingest nodes to provide additional capability to keep up with the pace of the incoming data. Eventually, the ingest processors stabilize relative to the throughput of incoming data, and the messages in the JMS queue are decrease as the extra ingest nodes work them off. In a similar fashion, if the allocated ingest nodes sufficiently process the incoming messages traffic, the system will deallocate ingest nodes, reducing down to a sufficient quantity,” See Paragraph 20).
Therefore, it would have been obvious before the effective filing data of invention was made to a person having ordinary skill in the art to modify Mohan et al. (multiple databases) with Cannaliato et al. (Correlating Cloud-Based Big Data in Real-Time).  This would have facilitated software testing by being able to manage large datasets using cyclic data traversal and accelerating the ability to find the desired data.  See Cannaliato et al. Paragraphs 3-6.  In addition, the references teach features that are directed to analogous art and they are directed to the same field of endeavor: database management.  


	The Mohan et al. reference as modified by Cannaliato et al. teaches all the limitations of claim 1.  With respect to claim 2, Mohan et al. teaches the  method as claimed in claim 1, further comprising the step of storing results of the data comparison process in a data- storage that is accessible to one or more users (“the new result csv file is returned to the end user to be opened” See Paragraph 39). 

	The Mohan et al. reference as modified by Cannaliato et al. teaches all the limitations of claim 1.  With respect to claim 3, Mohan et al. teaches the method as claimed in claim 1, wherein the source datasets and the target datasets are extracted based on extraction configurations provided by a user (“in some instances, the decision rule can be defined by a user via a graphical user interface that allows for visual manipulation of the relationships between one or more concepts and user entry of one or more logical rules. In some configurations, one or more changes to a set of concepts and/or the concept hierarchy can be detected by the decision rule generator module, with each change being propagated through all concepts and sub-concepts that include the changed concept” See Paragraph 56). 
	The Mohan et al. reference as modified by Cannaliato et al. teaches all the limitations of claim 1.  With respect to claim 4, Mohan et al. teaches the method as claimed in claim 1, wherein the unique key is assigned by using hash algorithm (“the extracted value from the first source file can be hashed by the data harmonization system, and used as the key in a key value pair containing the key and the original row of data in the first source file as a string list” See Paragraph 39). 
	The Mohan et al. reference as modified by Cannaliato et al. teaches all the limitations of claim 1.  With respect to claim 5, Mohan et al. teaches the  method as claimed in claim 1, wherein the unique key acts as a pointer for the selected string that facilitates in fast identification and extraction of data (“a database pointer” See Paragraph 46). 

	The Mohan et al. reference as modified by Cannaliato et al. teaches all the limitations of claim 1.  With respect to claim 6, Mohan et al. teaches the method as claimed in claim 1, wherein the one or more databases comprises one or more relational databases and one or more non-relational databases (“such as relational databases (e.g., Oracle, IBM DB2, Microsoft SQL Server, MySQL or PostgreSQL relational databases), one or more comma-separated values (CSV) files, one or more other pattern-delimited files, or other structured data format hierarchy” See Paragraph 54). 
	The Mohan et al. reference as modified by Cannaliato et al. teaches all the limitations of claim 1.  With respect to claim 7, Mohan et al. teaches the method as claimed in claim 1, wherein the one or more databases are local databases, and a network of database servers (“the set of structured data and/or unstructured data can be stored across multiple databases that are located, for example, either in separate non-transitory computer-readable media or on the same non-transitory computer-readable medium (on a computer system such as a personal computer or network server)” See Paragraph 54). 

	The Mohan et al. reference as modified by Cannaliato et al. teaches all the limitations of claim 1.  With respect to claim 9, Mohan et al. teaches the method as claimed in claim 1, wherein the size of extracted source and target datasets is reduced incrementally by marking the data being compared in its corresponding comparison cycle, and subsequently storing the marked data into a plurality of separate datasets including: 
a. a separate data-set in the source database (“data harmonization system can concatenate or combine the data stored in multiple columns or rows in a table (e.g., part of a database associated with a data source)”, See Paragraph 79); 
b. a separate data-set in the target database (“single string for comparison with the data stored in the columns of tables associated with a target data source (e.g., a target database)” See Paragraph 80); 
c. one or more data-sets present only in the target database (“single string for comparison with the data stored in the columns of tables associated with a target data source (e.g., a target database)” See Paragraph 80); and 
d. one or more matching data-sets from both the source and the target database (“If there is a match, then the actual value that corresponds to the (matched) hash value is read by the data harmonization system” See Paragraph 39). 

With respect to claim 10, Mohan et al. teaches a system for providing fast data comparison for software testing, the system comprising:

A memory (“a memory 152 and a processor 154,” See Paragraph 28) storing program instructions;

a processor (“a memory 152 and a processor 154,” See Paragraph 28)  executing program instructions stored in the memory and configured to:

select a source database and a target database from one or more databases (“first data source, and compared with the next selected data source” See Paragraph 39);

assign a unique key to each of the plurality of data-strings of each dataset (“the extracted value from the first source file can be hashed by the data harmonization system, and used as the key in a key value pair containing the key and the original row of data in the first source file as a string list” See Paragraph 39).  Mohan et al. does not disclose wherein the data is extracted directly into a file system of any available file location of a network server obviating need of any intermediate database.
However, Cannaliato et al. teaches extract a source dataset and a target dataset respectively from the selected source database and the target database, each dataset comprising a plurality of data-strings, wherein the data is extracted directly into a file system of any available file location of a network server obviating need of any intermediate database (“ETL ( extract, transform, and load),” See Paragraph 17 and “extract fixed fields from byte strings in binary files,” See Paragraph 34 and “source specifications for the parser are required and include: Field name; Field data type ( String, Number, Date/Time, Object, Array),” See Paragraph 42);
Generate a sequenced-file cache using the corresponding unique keys assigned to each of the plurality of data-strings, wherein the files are sequenced within the cache based on an available memory size (Values in the surrogate key dimensional column are unique sequential numbers used to insert a record into the platform and need to be defined if the enrichment cache is set to Add Record to DB, See Paragraph 53);
read incrementally, the sequenced-file cache, to perform data comparison between the source dataset and the target dataset irrespective of free memory available for testing (“When matching records are found as a result of the comparison S5, the transaction records are enriched with dimensional data from the dimension table S10 and the enriched records are stored in one or more platform databases 65, S15,” See Paragraph 56); and

reduce incrementally, size of the extracted source datasets and target datasets, to perform optimized data-comparison by eliminating any repetition in data-read and data comparison cycles (“If a condition occurs such that the allocated ingest nodes cannot work the messages off the queue at a sufficient rate, the queue size increases. When this happens, the platform automatically starts additional ingest nodes to provide additional capability to keep up with the pace of the incoming data. Eventually, the ingest processors stabilize relative to the throughput of incoming data, and the messages in the JMS queue are decrease as the extra ingest nodes work them off. In a similar fashion, if the allocated ingest nodes sufficiently process the incoming messages traffic, the system will deallocate ingest nodes, reducing down to a sufficient quantity,” See Paragraph 20).
Therefore, it would have been obvious before the effective filing data of invention was made to a person having ordinary skill in the art to modify Mohan et al. (multiple databases) with Cannaliato et al. (Correlating Cloud-Based Big Data in Real-Time).  This would have facilitated software testing by being able to manage large datasets using cyclic data traversal and accelerating the ability to find the desired data.  See Cannaliato et al. Paragraphs 3-6.  In addition, the references teach features that are directed to analogous art and they are directed to the same field of endeavor: database management.  

With respect to claim 11, it is rejected on grounds corresponding to above rejected claim 2, because claim 11 is substantially equivalent to claim 2.

With respect to claim 12, it is rejected on grounds corresponding to above rejected claim 3, because claim 12 is substantially equivalent to claim 3.

With respect to claim 13, it is rejected on grounds corresponding to above rejected claim 4, because claim 13 is substantially equivalent to claim 4.

With respect to claim 14, it is rejected on grounds corresponding to above rejected claim 5, because claim 14 is substantially equivalent to claim 5.

With respect to claim 15, it is rejected on grounds corresponding to above rejected claim 6, because claim 15 is substantially equivalent to claim 6.

With respect to claim 16, it is rejected on grounds corresponding to above rejected claim 7, because claim 16 is substantially equivalent to claim 7.


With respect to claim 18, it is rejected on grounds corresponding to above rejected claim 9, because claim 18 is substantially equivalent to claim 9.

With respect to claim 19, Mohan et al. teaches a computer program product comprising:

a non-transitory computer readable medium (“storage medium,” See Paragraph 37) having computer readable program code stored thereon, the computer readable program code comprising instructions that, when executed by at least one computer processor, cause the at least one computer processor to:

select a source database and a target database from one or more databases (“first data source, and compared with the next selected data source” See Paragraph 39);

assign a unique key to each of the plurality of data-strings of each dataset (“the extracted value from the first source file can be hashed by the data harmonization system, and used as the key in a key value pair containing the key and the original row of data in the first source file as a string list” See Paragraph 39).  Mohan et al. does not disclose wherein the data is extracted directly into a file system of any available file location of a network server obviating need of any intermediate database.
However, Cannaliato et al. teaches extract a source dataset and a target dataset respectively from the selected source database and the target database, each dataset comprising a plurality of data-strings, wherein the data is extracted directly into a file system of any available file location of a network server obviating need of any intermediate database (“ETL ( extract, transform, and load),” See Paragraph 17 and “extract fixed fields from byte strings in binary files,” See Paragraph 34 and “source specifications for the parser are required and include: Field name; Field data type ( String, Number, Date/Time, Object, Array),” See Paragraph 42);
generate a sequenced-file cache using the corresponding unique keys assigned to each of the plurality of data-strings, wherein the files are sequenced within the cache based on an available memory size (Values in the surrogate key dimensional column are unique sequential numbers used to insert a record into the platform and need to be defined if the enrichment cache is set to Add Record to DB, See Paragraph 53);
read incrementally, the sequenced-file cache, to perform data comparison between the source dataset and the target dataset irrespective of free memory available for testing (“When matching records are found as a result of the comparison S5, the transaction records are enriched with dimensional data from the dimension table S10 and the enriched records are stored in one or more platform databases 65, S15,” See Paragraph 56); and

reduce incrementally, size of the extracted source datasets and target datasets, to perform optimized data-comparison by eliminating any repetition in data-read and data comparison cycles (“If a condition occurs such that the allocated ingest nodes cannot work the messages off the queue at a sufficient rate, the queue size increases. When this happens, the platform automatically starts additional ingest nodes to provide additional capability to keep up with the pace of the incoming data. Eventually, the ingest processors stabilize relative to the throughput of incoming data, and the messages in the JMS queue are decrease as the extra ingest nodes work them off. In a similar fashion, if the allocated ingest nodes sufficiently process the incoming messages traffic, the system will deallocate ingest nodes, reducing down to a sufficient quantity,” See Paragraph 20).
Therefore, it would have been obvious before the effective filing data of invention was made to a person having ordinary skill in the art to modify Mohan et al. (multiple databases) with Cannaliato et al. (Correlating Cloud-Based Big Data in Real-Time).  This would have facilitated software testing by being able to manage large datasets using cyclic data traversal and accelerating the ability to find the desired data.  See Cannaliato et al. Paragraphs 3-6.  In addition, the references teach features that are directed to analogous art and they are directed to the same field of endeavor: database management.  

Claim(s) 8 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mohan et al. (US Pub. No. 20160004742) and Cannaliato et al. (US Pub. No. 20170068715) in further view of Coriell et al. (US Pub. No. 20100191718).

The Mohan et al. reference as modified by Cannaliato et al. teaches all the limitations of claim 1.  With respect to claim 8, Mohan et al. as modified by Cannaliato et al. does not disclose wherein the data comparison between the source dataset and the target dataset is performed using cyclic data traversal algorithm.
However, Coriell et al. teaches all the limitations of claim 1.  With respect to claim 8, Mohan et al. teaches the method as claimed in claim 1, wherein the data comparison between the source dataset and the target dataset is performed using cyclic data traversal algorithm (“Multiple paths and cycles are handled as part of the two-pass algorithm (as described in conjunction with FIG. 3 below), where multiple paths on the downward pass are broken into separate branches and cycles are not an issue given auto-aliasing and the rule to not traverse back up paths already traversed in the downward path among other rules” See Paragraph 57). 
Therefore, it would have been obvious before the effective filing data of invention was made to a person having ordinary skill in the art to modify Mohan et al. (multiple databases) and Cannaliato et al. (Correlating Cloud-Based Big Data in Real-Time) with Coriell et al. (database extraction).  This would have facilitated software testing by being able to manage large datasets using cyclic data traversal and accelerating the ability to find the desired data.  See Coriell et al. Paragraphs 3-16.  In addition, the references teach features that are directed to analogous art and they are directed to the same field of endeavor: database management.  

With respect to claim 17, it is rejected on grounds corresponding to above rejected claim 8, because claim 17 is substantially equivalent to claim 8.

Relevant Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US PG-PUB 20140040182 is directed to COLLECTION AND CONSOLIDATION OF HETEROGENEOUS REMOTE BUSINESS DATA USING DYNAMIC DATA HANDLING [0054] SaaS services, "cloud" sites, toolkits or "data aggregation" procedures operate via requesting the remote SMB member to manually operate the ETL process locally on their LOB data source and then they either require the member to manually upload or send data to the central site without a secure or guaranteed delivery data transfer service. Or they may provide for some type of local agent to upload extracted data to their site but without additional data transformation, normalization or standardization services being applied to the manually extracted data sets. These types of manual processes enable errors to occur through inconsistent manual data extraction procedures, incorrect or incomplete data extraction of requested data, or simple human error of uploading or FTPing data files manually which may cause non-delivery of data or incomplete data sets. Finally, when these existing data connection methods work, they typically cover simple scenarios such as exporting from a single LOB system a simple list of customers, contacts, invoices, or single purpose data subsets and the like, to a simple CSV, XML or text file without additional transformation, normalization or standardization methods being applied to the extracted data. The extracted data file is then manually uploaded, sent or communicated to a single remote site. These existing or known methods may also not provide data filters, selection criteria or dynamic methods to determine what state the local data is in (what is new, old, deleted and or changed), what data to extract, when to extract it nor are these methods delivered on a consistent or scheduled recurring basis. The existing website, SaaS or cloud "data aggregators" typically do not provide a method to transform, normalize or standardize the user supplied data and they typically require the SMB user to repeat the process for each LOB system that contains. Finally, even if a user were to manually extract and upload separate data sets, these existing systems do not provide ETL functionality to transform, normalize or standardize their separate sets of user supplied data into a single master consolidated data view nor do they provide a way to make normalized data sets for comparison to user defined "peer groups".
Response to Arguments
Applicant’s arguments with respect to claims 1-19 have been considered but are moot because the arguments do not apply to any of the references being used in the current rejection.

In response to applicants’ comments, “Without conceding propriety of the rejection and in a genuine effort to advance prosecution of the instant application, Applicant has amended claims 1, 10, and 19 “extract a source dataset and a target dataset respectively from the selected source database and the target database, each dataset comprising a plurality of data-strings, wherein the data is extracted directly into a file system of any available file location of a network server obviating need of any intermediate database.”  Examiner has added the Cannaliato et al. reference(s) to address the amendments to the claims.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS E ALLEN whose telephone number is (571)270-3562.  The examiner can normally be reached on Monday through Thursday 830-630.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hosain Alam can be reached on (571) 272-3978.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/N.E.A/Examiner, Art Unit 2154                                                                                                                                                                                                        
/HOSAIN T ALAM/Supervisory Patent Examiner, Art Unit 2154