Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This communication is responsive to Amendment, filed 10/20/2021.
 	Claims 1-18 are pending in this application. This action is made Final.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not 
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
Claims 1-18 are rejected under 35 U.S.C. 103 as being unpatentable over ROBICHAUD (US Pub No. 2016/0224618), in view of Tsirogiannis (US Pub No. 2014/0279838).
As to claims 1, 9, 14, Robichaud teaches an apparatus comprising:
	a network interface (i.e. Data Server System ... collect data obtained from a variety of different data sources, [0053 - 0554]) to receive a first dataset and a second dataset, wherein the first dataset is associated with the second dataset (i.e. receive data, Fig. 2; EVENT 416, 417, 418, IP address,  See Fig. 4);
a processor connected to the network interface, the processor to generate a first schema (i.e. using a flexible schema to specify how to extract information from the event data, [0046]; See Fig. 8A) from the first dataset by aggregating results of a query (i.e. This final result can comprise different types of data, [0070]) performed on data contained in the first dataset to obtain a first datatype of a first column of the data contained in the first dataset and to obtain a first data length of the first column by determining a maximum length of data within the first column (i.e. length of the data in the raw_ column, e.g. 1665 bytes, Fig. 8A; Data items of events are shown in FIG. 8A by a textual representation of their value, [0121]), wherein the first schema expresses the first data type and the first data length, the processor to generate a second schema from the second dataset, wherein the first schema and the second schema are in a common format (i.e. Table format 802 comprises one or more columns, such as columns 804a, 804b, 804c, 804d, 804e, 804f, 804g, 804h and one or more rows, such as rows 806a, 806b, 806c, 806d, and 806e, [0119], Fig. 8A);
the processor further to generate a matrix for comparison of data transformations, wherein the matrix includes the first schema and the second schema in the common format (i.e. A data item that comprises multiple values may comprise an array, matrix, or other representation of multiple values for a single event attribute of a single event, [0147], Table format 802 comprises one or more columns, such as columns 804a, 804b, 804c, 804d, 804e, 804f, 804g, 804h and one or more rows, such as rows 806a, 806b, 806c, 806d, and 806e, [0119], Fig. 8A).
	Robichaud does not seem to specifically teach:
the processor further to compare the first data type and the first data length of the first schema with an obtained second data type and second data length of a second column of the second schema to identify a discrepancy, if any between first data type and the second data type of the first data length and the second data length to validate the second dataset.
Tsirogiannis teaches this limitation as numeric types can be tracked and used to distinguish between 32-bit and 64-bit integral and floating-point types, [0358]; the string-typed records may indicate an error in data entry or validation or a corruption in the source data, [0359] (i.e. the same attribute appears multiple times in the source data with different types ... if a single attribute appears as an integer ... a string ... For example, the string-typed records may indicate an error in data entry or validation or a corruption in the source data, [0359]).
It would have been obvious to one of ordinary skill of the art having the teaching of Robichaud, Tsirogiannis before the effective filing date of the claimed invention to modify the system of Robichaud to include the limitations as taught by Tsirogiannis. One of ordinary skill in the art would be motivated to make this combination in order to validate the string-type records in view of Tsirogiannis ([0359]), as doing so would give the added benefit of determining whether the cumulative schema has changed since a previous exporting and, in response to use the frequencies to make decisions based on type and in resolving type conflicts as taught by Tsirogiannis ([0358]).

As per claim 2, Tsirogiannis teaches the apparatus of claim 1, wherein the first dataset is from a first database platform and the second dataset is from a second database platform, and wherein the first database platform and the second database platform are incompatible (i.e. numeric types can be tracked and used to distinguish between 32-bit and 64-bit integral and floating-point types, [0358]; the same attribute appears multiple times in the source data with different types, [0359].

As per claim 3, Robichaud teaches the apparatus of claim 1, wherein the processor is further to generate the first schema a first text-based table and generate the second schema as a second text-based table (i.e. Data items of events are shown in FIG. 8A by a textual representation of their value, [0121].
As per claim 4, Robichaud teaches the apparatus of claim 2, wherein the processor is further to combine the first text-based table and the second text-based table to generate the matrix (i.e. Data items of events are shown in FIG. 8A by a textual representation of their value, [0121].

As per claim 5, Robichaud teaches the apparatus of claim 4, wherein the processor is further to add an identification field to the matrix, wherein the identification field is to identify the first text-based table and the second text-based table (i.e. A data item that comprises multiple values may comprise an array, matrix, or other representation of multiple values for a single event attribute of a single even, [0147]; an exemplary entry in a summarization table can keep track of occurrences of the value "94107" in a "ZIP code" field of a set of events, wherein the entry includes references to all of the events that contain the value "94107" in the ZIP code field, [0092]).

As per claim 6, Robichaud teaches the apparatus of claim 5, wherein the identification field is to store a timestamp (i.e. raw data ...  other time-series data, [0047]; detect timestamps in the data, [0055]).

As per claim 7, Robichaud teaches the apparatus of claim 1, wherein the network interface is to receive the second dataset after a predetermined period of time subsequent to receipt of the first dataset (i.e. a format of the at least a portion of the first time-stamped events set changes based upon a first event threshold value, and a format of the at least a portion of the second time-stamped events set changes based upon a second event threshold value, [0424]).

As per claim 8, Robichaud teaches the apparatus of claim 7, wherein the network interface is to receive additional datasets periodically after each passage of the predetermined period of time to add a plurality of schemas to the matrix to generate a log of database activities (i.e. A collection query can be initiated by a user, or can be scheduled to occur automatically at specific time intervals, [0094]).

As per claim 10, Robichaud teaches  the method of claim 9, wherein generating the first schema comprises querying the first set of data to write a first text file, and wherein generating the second schema comprises querying the second set of data to write a second text file (i.e. A data item that comprises multiple values may comprise an array, matrix, or other representation of multiple values for a single event attribute of a single event, [0147], Table format 802 comprises one or more columns, such as columns 804a, 804b, 804c, 804d, 804e, 804f, 804g, 804h and one or more rows, such as rows 806a, 806b, 806c, 806d, and 806e, [0119], Fig. 8A).

As per claim 11, Robichaud teaches  the method of claim 10, wherein generating the matrix comprises appending the second text file to the first text file (i.e. A data item that comprises multiple values may comprise an array, matrix, or other representation of multiple values for a single event attribute of a single event, [0147], Table format 802 comprises one or more columns, such as columns 804a, 804b, 804c, 804d, 804e, 804f, 804g, 804h and one or more rows, such as rows 806a, 806b, 806c, 806d, and 806e, [0119], Fig. 8A).

As per claim 12, Robichaud teaches  the method of claim 11, further comprising inserting an identification field in the matrix to identify the first schema and the second schema (i.e. A data item that comprises multiple values may comprise an array, matrix, or other representation of multiple values for a single event attribute of a single event, [0147], Table format 802 comprises one or more columns, such as columns 804a, 804b, 804c, 804d, 804e, 804f, 804g, 804h and one or more rows, such as rows 806a, 806b, 806c, 806d, and 806e, [0119], Fig. 8A).

As per claim 13, Tsirogiannis teaches the method of claim 12, wherein identification field is populated with a timestamp (i.e. all the values of a column (or attribute) must have the exact same type (e.g., integer, string, timestamp, etc.), [0243]).

As per claim 15, Tsirogiannis teaches the non-transitory machine-readable medium of claim 14, further comprising instructions to receive additional datasets periodically to generate a log of schema changes (i.e. The ingestion task can be roughly divided into two parts: the initial ingestion task that loads a large volume of new user data, and incremental ingestion, which occurs periodically when new data is available, [0222]).

As per claim 16, Robichaud teaches the apparatus of claim 1, wherein the
(i.e. A data item that comprises multiple values may comprise an array, matrix, or other representation of multiple values for a single event attribute of a single event, [0147], Table format 802 comprises one or more columns, such as columns 804a, 804b, 804c, 804d, 804e, 804f, 804g, 804h and one or more rows, such as rows 806a, 806b, 806c, 806d, and 806e, [0119], Fig. 8A).

As per claim 17, Tsirogiannis teaches the apparatus of claim 1, wherein the processor is to aggregate the results of the query by determining a maximum value of data in a column of the first dataset (i.e. these metrics may include basic statistical measures such as the minimum and maximum values, [0365]).

As per claim 18, Tsirogiannis teaches the apparatus of claim 1, wherein the processor is to aggregate the results of the query by determining a maximum string length of data in a column of the first dataset (i.e. statistics such as the average length of string attributes, [0364]).

Response to Arguments
	Applicant's arguments filed 03/10/2003 have been fully considered but they are not persuasive. 
(a) Robichaud teaches “obtain a first data type of a first column of the data contained in the first dataset and to obtain a first data length of the first column
by determining a maximum length of data within the first column” as follows:
a first data type of a first column of the data contained in the first dataset limitation equates to _raw column in Fig. 8A.
A first data type of a first column of the data contained in the first dataset and to obtain a first data length of the first column by determining a maximum length of data within the first column limitation equates to length of the data in the raw_ column, e.g. 1665 bytes, Fig. 8A; Data items of events are shown in FIG. 8A by a textual representation of their value, [0121].
a maximum length of data within the first column limitation equates to 1665 bytes, See column 804e in Fig. 8A. it should be noted:
Raw column (column 804b) in the Fig. 8 corresponds to plurality of event raw data. 
Bytes column (column 804 e) corresponds to the length of the corresponding event raw data (i.e. row 806a corresponds to event 1 and column 804a corresponds to an event attribute of event 1 having an attribute label of _time, comprising a timestamp data item. Other attribute labels shown in FIG. 8A include _raw, corresponding to event raw data, source and host corresponding to metadata, and bytes, clientip, method, and referer, corresponding to extracted fields, [0120]).
For example:
1665 is the length of the _raw data of Row 1 in Fig. 8 A.
1369 is the length of the _raw data of Row 2 in Fig. 8 A.
2252 is the length of the _raw data of Row 3 in Fig. 8 A.
893 is the length of the _raw data of See Row 4 in Fig. 8 A.

Determining the max length of data (e.g. _raw data in Fig. 8A) equates to A statistical value may refer to a value generated from an event using one or more statistical functions ... statistical commands may be commands known to produce one or more statistical values an output, [0145].
Fig. 8D shows an example of statistical value as sum of bytes, average of bytes.
It is thus Robichaud teaches determing a data length by determining a maximum length of data within a column.

(b) With respect to the limitation “compare the first data type and the first data length of the first schema with an obtained second data type and second data length of a second column of the schema to identify a discrepancy, IF ANY…”, it is important to note the interjectory “if any” is used to indicate that something will probably not happen it will not come to pass to consider. However, although Robichaud does not seem to explicitly teach this limitation, Tsirogiannis teaches:
a first data type of a first column of the data contained in the first dataset as numeric types can be tracked and used to distinguish between 32-bit and 64-bit integral and floating-point types, [0358]; the same attribute appears multiple times in the source data with different types, [0359].
A first data type of a first column of the data contained in the first dataset and to obtain a first data length of the first column by determining a maximum length of data within the first column as numeric types can be tracked and used to distinguish between 32-bit and 64-bit integral and floating-point types, [0358]; the same attribute appears multiple times in the source data with different types, [0359].
Maximum length of data within the first column limitation equates to 64-bit integral, [0359].
first schema express the first data type and the first data length, the processor to generate a second schema from the second dataset, wherein the first schema and the second schema are in a common format limitation as A new logical or physical event table is created having as columns the union of all columns from the input tables (the same column name appearing in more than one table may lead to only a single column with that name in the new table) and at least two new columns: "event" and "time.", [0339].
The first schema express the first data type and the first data length limitation as numeric types can be tracked and used to distinguish between 32-bit and 64-bit integral and floating-point types, [0358]; the same attribute appears multiple times in the source data with different types, [0359].
generate a second schema from the second dataset wherein the first schema and the second schema are in a common format as A new logical or physical event table is created having as columns the union of all columns from the input tables, [0339].
Tsirogiannis then further teaches “compare the first data type and the first data length of the first schema with an obtained second data type and second data length of a second column of the schema to identify a discrepancy, IF ANY” as the string-typed records may indicate an error in data entry or validation or a corruption in the source data, [0359] (i.e. if a single attribute appears as an integer ... a string ... For example, the string-typed records may indicate an error in data entry or validation or a corruption in the source data, [0359]); or numeric types can be tracked and used to distinguish between 32-bit and 64-bit integral and floating-point types, [0358]; the same attribute appears multiple times in the source data with different types, [0359].
Identifying a discrepancy limitation equates to an integer and a string, [0359]; distinguish between 32-bit and 64-bit integral and floating-point types, [0358].
The first data type and the first data length of the first schema limitation equates to numeric type, 32-bit, [0358].
The second data type and second data length of the second schema limitation equates to numeric type, 64-bit, [0358].
Identifying a discrepancy limitation equates to numeric types can be tracked and used to distinguish between 32-bit and 64-bit integral and floating-point types, [0358].
 Validating the second dataset limitation equates indicating an error in data entry or validation or a corruption in the source data, [0359].
Therefore, the claim language as presented is still read on by the cited references at the cited paragraph in the claim rejections.  Arguments as raised are moot since all claim limitations relevant to this issue have been addressed accordingly.

Conclusion
	THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time 

A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MIRANDA LE whose telephone number is (571)272-4112.  The examiner can normally be reached on M-F 7AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alford W Kindred can be reached on 571-272-4037.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For 
/MIRANDA LE/Primary Examiner, Art Unit 2153