DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
Applicant’s Information Disclosure Statement, filed 10/21/2019, has been received, entered into the record, and considered.  See attached form PTO-1449.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 11, 13, 15, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Yang et al. (“Yang”) US 20170344749 A1 in view of GitHub (“Parquet” Oct. 16, 2017 – IDS submitted on 10/21/2019).

Regarding claim 1, Yang teaches A method comprising: 
compressing an uncompressed independent column into a compressed independent column as the data storage 110 is associated with a data processor 120 to process the input data 115 into one or more data records for storage 
Compression of actual data values can be referred to as columnar compression because different columns of data can be compressed in different ways, often depending only on the data in that column [0018]; 
compressing, based on the compressed independent column, an uncompressed dependent column into a compressed dependent column as The data processor 120 compresses the data (e.g., using Apache Parquet, etc.) for storage in the data storage 110 [0028]; 
Certain examples keep the data records in a compressed format and in columnar storage (e.g., Apache Parquet, etc.). Data compression reduces a data size on physical disk, for example, which decreases data traffic between disk and system memory, for example [0024]. 
storing in a file: the compressed independent column, and the compressed dependent column as The data processor 120 compresses the data (e.g., using Apache Parquet, etc.) for storage in the data storage 110 ([0028 0018, and 0024]).
GitHub is cited for additional support of the limitation “storing in a file…” as the figure depicts an Apache Parquet file (Pg. 4).
The Apache Parquet file stores the metadata for the column (i.e., dependent data) in the Page header (ThriftCompactProtocol) - The Parquet file stores the Column  a (i.e., independent) and the Column a metadata (i.e., dependent) in the same file (Pg. 


    PNG
    media_image1.png
    615
    761
    media_image1.png
    Greyscale

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the cited references because GitHub’ teaching would have allowed Yang’s to optimize the system performance by utilizing the Parquet file format for the efficient compression scheme and columnar storage format.

Regarding claims 2 and 11, Yang further teaches said compressing the uncompressed independent column comprises generating primary metadata that describes the compressed independent column; said compressing the uncompressed dependent column comprises generating secondary metadata that describes the compressed dependent column; the method further comprises storing, in same said file, between the compressed independent column and the primary metadata: the compressed dependent column, and the secondary metadata as The data processor 120 compresses the data (e.g., using Apache Parquet, etc.) for storage in the data storage 110 ([0028, 0018, and 0024]).
GitHub teaches The Apache Parquet file stores the metadata for the column (i.e., dependent data) in the Page header (ThriftCompactProtocol) - The Parquet file stores the Column a (i.e., independent) and the Column a metadata (i.e., dependent) in the same file (Pg. 4). Parquet is built to support very efficient compression and encoding schemes… Parquet allows compression schemes to be specified on a per-column level (Pg. 2).

Regarding claims 13 and 19, Yang further teaches wherein: the uncompressed independent column conforms to a data schema; the file contains the data schema as The data processor 120 compresses the data (e.g., using Apache Parquet, etc.) for storage in the data storage 110. The data processor 120 facilitates storage of the columnar formatted, tagged, compressed data record in the data storage 110 [0028]; 
Note: The Parquet file is schema-on-write file format (i.e., the schema is known a-priori, during ingest time, and is embedded into the metadata in the file).
the uncompressed dependent column is schema-less as Based on its processing of the input data 115 into data records for storage, the data processor 120 generates metadata 130. The metadata 130 includes information describing and/or otherwise characterizing the data stored in the data storage 110 as well as user(s)/group(s) of uses allowed to access stored data [0029].

	Regarding claim 15, the claim recites one or more non-transitory computer-readable media with similar limitations as claim 1 and as such rejected under the same rationale as noted above for claim 1.

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Yang et al. (“Yang”) US 20170344749 A1 in view of GitHub (“Parquet” Oct. 16, 2017 – IDS submitted on 10/21/2019) as applied to claims 1 and 15 further in view of Ransom, Douglas S. (“Ransom”) US 20050144437 A1.

Regarding claim 3, Yang and GitHub do not explicitly teach wherein the secondary metadata contains provenance metadata that is digitally signed.
Ransom; however, teaches wherein the secondary metadata contains provenance metadata that is digitally signed as The EM Software requests some data and EM Component 1430 uses a PKI signing scheme to sign the data before sending it. In this fashion any user can be confident of this data's provenance [0237].
 A digital signature is an electronic signature that can be used to authenticate the identity of a sender of an electronic message, or of the signer of an electronic document. A digital signature may also be used to ensure that the original content of the message or document has not been altered after it was signed [0117].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the cited references Ransom’s teaching would have allowed Yang-GitHub’s to ensure integrity of the security mechanism by utilizing certificates and certificate authorities.

Claims 4, 8, 12, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Yang et al. (“Yang”) US 20170344749 A1 in view of GitHub (“Parquet” Oct. 16, 2017 – IDS submitted on 10/21/2019) as applied to claims 1 and 15 further in view of Alsubaiee; Sattam et al. (“Alsubaiee”) US 20170193019 A1.

Regarding claims 4 and 16, Yang and GitHub do not explicitly teach the steps of:
parsing text values into parsed values; 
storing the parsed values into the uncompressed independent column; 2 ORA190200-US-NPDocket No. 50277-5549 
storing the text values into the uncompressed dependent column.
Alsubaiee; however, teaches the steps of:
parsing text values into parsed values as As shown in FIG. 3, when record 4 is ingested, the system will detect that the schema of record 4 has a field “Customer ID” which is of the type “string” (i.e., parsed values) and therefore is heterogeneous with the schema 303B for the first row group 303A which has a field “Customer ID” which is of the type “int.” (i.e., parsed value)This results in the creation of the second row group 304A and corresponding second schema 304B [0051]; 
storing the parsed values into the uncompressed independent column as Input data 301 is ingested using the methods described above to generate columnar data 302. As shown in FIG. 3, columnar data 302 includes a first row group 303A having 
storing the text values into the uncompressed dependent column as 
The columnar data 302 also includes a metadata section 306 including data schema 303B corresponding to row group 303A ([0050 and 0046] and Fig. 3, element 306).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the cited references because Alsubaiee’ teaching would have allowed Yang-GitHub’s to provide a means to ingest and process the input document into a columnar format.

Regarding claim 8, Yang and GitHub do not explicitly teach the steps of:
the uncompressed independent column contains a plurality of values; 
a plurality of row-major records contain said plurality of values; 
the uncompressed dependent column contains the plurality of row-major records.
Alsubaiee; however, teaches the steps of:
the uncompressed independent column contains a plurality of values as Fig. 3, elements 303A, 304A, and 305A.
a plurality of row-major records contain said plurality of values as columnar data 302 includes a first row group 303A having column chunks for Customer Name, Customer ID, and Address columns, a second row group 304A having column chunks for Customer Name, Customer ID, and Address columns, and a third row group having 
the uncompressed dependent column contains the plurality of row-major records as The columnar data 302 also includes a metadata section 306 including data schema 303B corresponding to row group 303A, data schema 304B corresponding to row group 304A, and data schema 305B corresponding to row group 305A [0050].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the cited references because Alsubaiee’s teaching would have allowed Yang-GitHub’s to provide a means to ingest and process the input document into a columnar format.

Regarding claim 12, Yang and GitHub do not explicitly teach wherein the file is at least one selected from the group consisting of append only and write once.
Alsubaiee; however, teaches wherein the file is at least one selected from the group consisting of append only and write once as Parquet has been built to work on Hadoop File System (HDFS). HDFS is an append-only file system [0004].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the cited references because Alsubaiee’s teaching would have allowed Yang-GitHub’s to optimize the system performance by utilizing the Parquet file format for efficient compression scheme and columnar storage format.

Claims 5 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Yang et al. (“Yang”) US 20170344749 A1 in view of GitHub (“Parquet” Oct. 16, 2017 – IDS submitted on 10/21/2019) as applied to claims 1 and 15 further in view of Alsubaiee; Sattam et al. (“Alsubaiee”) US 20170193019 A1 and Petropoulos; Michail et al. (“Petropoulos”) US 20180285418 A1.

Regarding claims 5 ad 17, Yang teaches the steps of:
tokenizing a text document into separator values and text values as the input data 115 is processed by the data processor 120 to extract data field(s). The data fields include a user/group identification and query search term(s). The data processor 120 organizes the extracted data into a columnar data record. The data can be compressed and/or uncompressed data, for example ([0053 and 0028]); 
Yang and GitHub do not explicitly teach the steps of:
parsing text values into parsed values; 
storing the parsed values into the uncompressed independent column; 
storing the separator values into the uncompressed dependent column.
Alsubaiee; however, teaches the steps of:
parsing text values into parsed values as As shown in FIG. 3, when record 4 is ingested, the system will detect that the schema of record 4 has a field “Customer ID” which is of the type “string” (i.e., parsed values) and therefore is heterogeneous with the schema 303B for the first row group 303A which has a field “Customer ID” which is of the type “int.” This results in the creation of the second row group 304A and corresponding second schema 304B [0051]; 
storing the parsed values into the uncompressed independent column as Input data 301 is ingested using the methods described above to generate columnar data 302 ([0049] and Fig. 3 elements 303A).
 2 ORA190200-US-NPDocket No. 50277-5549 It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the cited references because Alsubaiee’s teaching would have allowed Yang-GitHub’s to provide a means to ingest and process the input document into a columnar format.
Yang teaches In an example big data platform, the data record is associated with a comma-separated string. To filter records from the big data set, a regular expression or loop is traditionally used to match names as defined in the text strings [0023].
Yang, GitHub, and Alsubaiee do not explicitly teach storing the separator values into the uncompressed dependent column.
Petropoulos; however, teaches storing the separator values into the uncompressed dependent column as metadata for data that is not-structured may be stored as part of data catalog service 240, including information about data types, names, delimiters of fields, and/or any other information to access the data that is not-structured, including metadata generated as part of an ingestion process executed by not structure data processing service 220 [0031].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the cited references because Petropoulos’ teaching would have allowed Yang-GitHub-Alsubaiee’s to .

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Yang et al. (“Yang”) US 20170344749 A1 in view of GitHub (“Parquet” Oct. 16, 2017 – IDS submitted on 10/21/2019) as applied to claims 1 and 15 further in view of Upadhyay; Vivekkumar (“Upadhyay”) US 20200117824 A1.

Regarding claim 7, Yang and GitHub do not explicitly teach wherein the uncompressed dependent column comprises a redaction or masking of the uncompressed independent column.
Upadhyay; however, teaches wherein the uncompressed dependent column comprises a redaction or masking of the uncompressed independent column as At 140, one or more personal information privacy controls are applied to the at least one marked column. Exemplary privacy controls are discussed in more detail below, and may include, for example, and without limitation: encrypting the data, masking the data, blocking access to the data, and/or deleting the data. Again, because data is stored in a columnar format, these privacy controls can be applied to individual data types based on their specific consents associated therewith [0060].
The data encryption and masking module 242 takes the data that has been converted into columnar file format in the data manger 220 and encrypts the particular columns of data, for which encryption is either required or desired. So, for example, a given legal entity may require identified personal data to be encrypted 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the cited references because Upadhyay’s teaching would have allowed Yang-GitHub’s to protect sensitive information by employing the data encryption and masking module so that access to non-authorized individuals is restricted.

Allowable Subject Matter
Claims 6, 9-10, and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
The prior art made of record and not relied upon in form PTO-892 is considered pertinent to applicant's disclosure.





Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ashish K. Thomas can be reached on : 571-272-0631. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LESLIE WONG/Primary Examiner, Art Unit 2164