DETAILED ACTION
Summary and Status of Claims
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
This Office Action is in response to Application No. 16/727,219 filed 12/26/2019.
Claims 1-20 are pending.
Claims 1-20 are rejected under 35 U.S.C. 112(b).
Claims 1-3, 5-10, 12-17, 19, and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Infante Suarez et al. (US Patent Pub 2017/0337229).
Claims 4, 11, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Infante Suarez et al. (US Patent Pub 2017/0337229), in view of Sareen et al. (US Patent Pub 2014/02228430).
Specification
Applicant is reminded of the proper language and format for an abstract of the disclosure.

The abstract should be in narrative form and generally limited to a single paragraph on a separate sheet within the range of 50 to 150 words.  It is important that the abstract not exceed 150 words in length since the space provided for the abstract on the computer tape used by the printer is limited.  The form and legal phraseology often used in patent claims, such as "means" and "said," should be avoided.  The abstract should describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details.

The language should be clear and concise and should not repeat information given in the title.  It should avoid using phrases which can be implied, such as, "The disclosure concerns," "The disclosure defined by this invention," "The disclosure describes," etc.

The abstract of the disclosure is objected to because the abstract is not clear and concise.  Instead it repeats information given in the title and includes legal phraseology and claim language.  Correction is required.  See MPEP § 608.01(b).
The attempt to incorporate subject matter into this application by reference to Application 16/727,060 is ineffective because it does not express a clear intent to incorporate the reference using the root words “incorporate” and “reference”.  
The attempt to incorporate subject matter into this application by reference to Application 16/727,142 is ineffective because it does not express a clear intent to incorporate the reference using the root words “incorporate” and “reference”.  

Claim Objections
Claim 6, 13, and 20 are objected to because of the following informalities:  
In claims 6, 13, and 20, “the processing device” should be “the processor” for consistency.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


Claims 1-20 rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Claims 1, 3, 8, 10, 15, and 17 recite “disk” in multiple limitations.  However, the recitation does not distinguish whether it is the same “disk” introduced at the start of each independent claim, or if each recitation is a separate instance of a disk.  Clarification is required.  For the prior art rejections below, each recitation of “disk” after the first recitation is interpreted as referring to the initial “disk”.  In other words, there is only one disk.
Claims 3, 10, and 17 recite “sorting … splits located in each file in the sorted list of partial metadata files.”  This limitation is inconsistent with the limitations recited in the respective base claims.  Each of the base claims generates a “partial metadata file” for each split that is detected.  Therefore, there is a one-to-one correspondence between splits to files.  The aforementioned limitation recited in claims 3, 10, and 17 aims to sort a plurality of splits located in each file (i.e., many to one correspondence of splits to file).  Para. 0041 of the specification describes both scenarios.  However, as currently recited in the claims, each of the independent claims recite a one-to-one correspondence and claims 3, 10, and 17 require a many-to-one correspondence, which is inconsistent with the independent claim.  For the prior art rejections below, the sorting the splits limitation recited in claims 3, 10, and 17 will be treated as performed upon performance of sorting the partial metadata files.
The remaining claims are rejected because they depend on a rejected claim.  

Note on Prior Art Rejections
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1-3, 5-10, 12-17, 19, and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Infante Suarez et al. (US Patent Pub 2017/0337229) (Infante Suarez).
In regards to claim 1, Infante Suarez discloses a method comprising:
a.	receiving, by a processor, data to write to disk, the data comprising a subset of a dataset (Infante Suarez at paras. 0017, 0019, 0029)1;
b.	writing, by the processor, a first portion of the data to disk (Infante Suarez at paras. 0029-31)2;
c.	detecting, by the processor, a split boundary after writing the first portion (Infante Suarez at paras. 0029-31)3;
d.	recording, by the processor, metadata describing the split boundary (Infante Suarez at paras. 0029-32)4;
e.	continuing, by the processor, to write a remaining portion of the data to disk (Infante Suarez at paras. 0029-33)5; and
f.	after completing the writing of the data to disk:
i.	generating, by the processor, a partial metadata file for the data, the partial metadata file including the split boundary (Infante Suarez at paras. 0029-32)6, and
ii.	transmitting, by the processor, the partial metadata to a partial metadata collector.  Infante Suarez at paras. 0067-68.7
In regards to claim 2, Infante Suarez discloses the method of claim 1 further comprising generating alignment data after recording the split boundary, the alignment data comprising metadata aligning the first portion of the data to a root dataset.  Infante Suarez at para. 0067-68.8
In regards to claim 3, Infante Suarez discloses the method of claim 1, further comprising:
a.	receiving, by the processor, the partial metadata file and a plurality of additional partial metadata files (Infante Suarez at paras. 0062, 0067-68)9;
b.	sorting, by the processor, the partial metadata file and the plurality of additional partial metadata files to generate a sorted list of partial metadata files (Infante Suarez at paras. 0062, 0067-68)10;
c.	sorting, by the processor, splits located in each file in the sorted list of partial metadata files (Infante Suarez at paras. 0062, 0067-68)11; and
d.	writing, by the processor, the sorted list of partial metadata to disk as a full metadata file.  Infante Suarez at paras. 0067-6812.
In regards to claim 5, Infante Suarez discloses the method of claim 1, the recording metadata describing the split boundary comprising reporting a row count of the split.  Infante Suarez at para. 0067.13
In regards to claim 6, Infante Suarez discloses the method of claim 1, the detecting the split boundary comprising detecting that a current file is too large to fit in a memory coupled to the processing device.  Infante Suarez at paras. 0044-50.14
In regards to claim 7, Infante Suarez discloses the method of claim 1, the generating the partial metadata file for the data comprising writing a schema to the partial metadata file.  Infante Suarez at para. 0036.15
Claim 8 is essentially the same as claim 1 in the form of a non-transitory computer readable storage medium for tangibly storing computer program instructions capable of being executed by a computer processor (Infante Suarez at paras. 0085, 0088), the computer program instructions defining the steps of the method recited in claim 1.  Therefore, it is rejected for at least the same reasons.

Claims 9, 10, and 12-14 are essentially the same as claims 2, 3, and 5-7, respectively, in the form of a computer readable storage medium.  Therefore, they are rejected for the same reasons.

In regards to claim 15, Infante Suarez discloses an apparatus comprising:
a.	a processor (Infante Suarez at para. 0088);
b.	a storage medium for tangibly storing thereon program logic for execution by the processor, the stored program logic causing the processor to perform the operations (Infante Suarez at para. 0085) of:
i.	receiving data to write to disk, the data comprising a subset of a dataset (Infante Suarez at paras. 0017, 0019, 0029)16;
	ii.	writing a first portion of the data to disk (Infante Suarez at paras. 0029-31)17,
	iii.	detecting a split boundary after writing the first portion (Infante Suarez at paras. 0029-31)18,
	iv.	recording metadata describing the split boundary (Infante Suarez at paras. 0029-32)19,
	v.	continuing to write a remaining portion of the data to disk (Infante Suarez at paras. 0029-33)20, and
	vi.	after completing the writing of the data to disk:  generating a partial metadata file for the data, the partial metadata file including the split boundary (Infante Suarez at paras. 0029-32)21, and transmitting the partial metadata to a partial metadata collector.  Infante Suarez at paras. 0067-68.22
Claims 16, 17, 19, and 20 are essentially the same as claims 2, 3, 5, and 6, respectively, in the form of an apparatus.  Therefore, they are rejected for the same reasons.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 4, 11, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Infante Suarez et al. (US Patent Pub 2017/0337229) (Infante Suarez), in view of Sareen et al. (US Patent Pub 2014/02228430) (Sareen).
In regards to claim 4, Infante Suarez discloses the method of claim 3, but does not expressly disclose further comprising validating alignment of the splits after sorting the splits.
Sareen discloses a distributed file system for storing raw data.  Sareen at Fig. 1.  Sareen further discloses breaking a large file into multiple sub-sections (i.e., splits) and adding metadata to each sub-section to indicate that a sub-section is part of a larger section and how it fits together with other sub-sections.  Sareen at para. 0036.  When the large file is to be accessed, the sub-sections are gathered and sorted based on the metadata.  This sorting and aggregating process is validated to ensure all subsections have been loaded and that all sections are correctly numbered from 1 to N, where N is the value of the last subsection (i.e., validating alignment of the splits after sorting the splits).  Sareen at para. 0039.
Infante Suarez and Sareen are analogous art because they are both directed to the same field of endeavor of storing large data in a distributed file system.
At the time before the effective filing date of the instant application, it would have been obvious to one of ordinary skill in the art to modify Infante Suarez by adding the feature of validating alignment of the splits after sorting the splits, as disclosed by Sareen.
The motivation for doing so would have been to ensure all subsections are correctly ordered so the original data can be properly accessed.

Claim 11 is essentially the same as claim 4 in the form of a computer readable storage medium.   Therefore, it is rejected for the same reasons.
Claim 18 is essentially the same as claim 4 in the form of an apparatus.  Therefore, it is rejected for the same reasons.

Additional Prior Art
Additional relevant prior art are listed on the attached PTO-892 form.  Some examples are:
Nowicki et al. (US Patent Pub 2003/0195895) discloses a storage system having partitioned metadata where each partition of data has an associated metadata.
Sharpe et al. (US Patent Pub 2013/0339407) discloses a distributed file system where large files are split across multiple files and distributively stored.
Piggin et al. (US Patent Pub 2014/0344507) discloses a system and method for storage metadata management for storing data segments with associated metadata for each segment.
Pradeep et al. (US Patent Pub 2016/0041976) discloses a database system for processing large log files in a multi-tenant system.  Large log files are split into tenant specific logs with associated metadata.
Slezak et al. (US Patent Pub 2020/0004749) discloses a system and method for intelligent capture of granulated data summaries in a database.  The system splits a large input into chunks of rows before storing them with associated metadata about the chunks.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL LE whose telephone number is (571)272-7970.  The examiner can normally be reached on M-F: 9:30am-6pm ET.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tony Mahmoudi can be reached on 571-272-4078.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/MICHAEL LE/Examiner, Art Unit 2163                                                                                                                                                                                                        
	/TONY MAHMOUDI/               Supervisory Patent Examiner, Art Unit 2163                                                                                                                                                                                         


    
        
            
        
            
    

    
        1 Spatial data (i.e., dataset) is split into subsets (i.e., data comprising a subset) to be stored in the distributed filesystem (i.e., to write to disk).
        2 The split vector data (i.e., data) is written to the storage.
        3 After storing the split vector data (i.e., after writing the first portion), an index record is generated for each sub-unit in the split (i.e., detecting a split boundary…).
        4 An index record is generated (i.e., recording … metadata), which describes the sub-units of the split.  The index record also includes split information that includes information about records adjacent to the record in the split (i.e., describing the split boundary).
        5 Remaining sub-units are stored.
        6 The local index is generated (i.e., partial metadata file) for the split vector data.
        7 The index module collects generated local indexes (i.e., partial metadata collector).
        8 The local index metadata is used to sort and order the entries into a global index.  Since the local index metadata is used to sort and order the index records of the collected local indexes, it is interpreted as alignment data because it allows the system to recreate the original spatial data (i.e., root dataset).
        9 Local indexes are collected.
        10 The local indexes are sorted.
        11 The entries within the local indexes are also sorted (i.e., sorting splits located in each file).
        12 The local indexes are combined (i.e., writing the sorted list of partial metadata) into a global index (i.e., full metadata file).
        13 Sub-units are vector records within a vector data split.  Therefore, the number of sub-units corresponds to a row count of the split.
        14 The distributed file system has a unit size, which is a size that must not be exceeded.  Therefore, vector data is split into sizes that does not exceed the unit size. This is interpreted as the vector data is too large to fit in the distributed file system as one unit because it is larger than the unit size.
        15 Each local index (i.e., partial metadata file) includes specific fields (i.e., schema) storing particular information.
        16 Spatial data (i.e., dataset) is split into subsets (i.e., data comprising a subset) to be stored in the distributed filesystem (i.e., to write to disk).
        17 The split vector data (i.e., data) is written to the storage.
        18 After storing the split vector data (i.e., after writing the first portion), an index record is generated for each sub-unit in the split (i.e., detecting a split boundary…).
        19 An index record is generated (i.e., recording … metadata), which describes the sub-units of the split.  The index record also includes split information that includes information about records adjacent to the record in the split (i.e., describing the split boundary).
        20 Remaining sub-units are stored.
        21 The local index is generated (i.e., partial metadata file) for the split vector data.
        22 The index module collects generated local indexes (i.e., partial metadata collector).