Detailed Action
This action is based on Applicant's remarks/arguments received on 04/26/2021. Applicant amended claims 1, 4, 11, 14, 21, and 24; canceled claims 3, 7, 13, 17, 23, and 27 and presented claims 1, 2, 4-6, 8-12, 14-16, 18-22, 24-26 and 28-30 for examination.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 4, 5, 8-12, 14, 15, 18-22, 24, 25 and 28-30 are rejected under 35 U.S.C. 103(a) as being unpatentable over Soundararajan et al., Pub. No.: US 2013/0132967 (Soundararajan) in view of Jain Pub. No.: US 2013/0007753 (Jain).

Claim 1.	Soundararajan teaches:
A method comprising:
identifying, based on a received plurality of queries, a plurality of tasks for processing the received plurality of queries, wherein the plurality of queries is directed to a plurality of files stored across a plurality of remote storage devices, the plurality of tasks to be processed by a plurality of clusters, wherein each of the plurality of clusters includes multiple ones of a plurality of data nodes, and each of plurality of data nodes includes a central processing unit (CPU) and a cache memory; (a job manager (¶¶ 48, 57-58) divides received jobs/queries to a plurality of tasks and distributes the tasks to a plurality of  data analytics processing nodes/clusters; each data analytics processing nodes includes multiple processors and memory caches (¶¶ 30-31);  each data analytics processing node executes an assigned task by associated processors and memory caches (¶¶ 40-41) or in storage server (¶¶ 32, 42))   
 referencing a metadata store to determine whether a file of the plurality of files associated with a task of the plurality of tasks is cached at least in part by the cache memory of one or more data nodes of an execution platform comprising a plurality of clusters; (the job manger (¶¶ 48, 57-58) distributes tasks to data analytics processing nodes based on whether the data needed to perform the tasks, or a portion of the data, is already present in the associated memory caches using information describing the contents of memory caches in metadata (¶¶ 50, 57-58))
in response to determining that the file is cached at least in part by one or more data nodes, assigning processing of the task to a data node of the one or more data nodes that has cached at least a part of the file; (the job manger (¶¶ 48, 57-58) distributes tasks to data analytics processing nodes based on whether the data needed to perform the tasks, or a portion of the data, is already present in the associated memory caches using information describing the contents of memory caches in metadata (¶¶ 50, 57-58))
in response to determining that the file is not cached entirely by the data node, retrieving, by a processor, one or more parts of the file that are not cached by the data node from one or more remote storage devices of the plurality of remote storage devices; and (the job manger (¶¶ 48, 57-58) distributes tasks to data analytics processing nodes based on whether the data needed to perform the tasks, or a portion of the data, is already present in the associated memory caches using information describing the contents of memory caches in metadata (¶¶ 50-57-58); a portion of the data not present in the associated memory caches is retrieved from storage server (¶¶ 56, 62))
executing the plurality of tasks with the plurality of data nodes. (¶ 67, tasks are performed/executed by the assigned data analytics processing nodes)
Soundararajan did not specifically teach:
a number of clusters in the plurality of clusters is based on at least a load of the plurality of clusters.
Jain discloses:
executing the plurality of tasks with the plurality of data nodes and a number of clusters in the plurality of clusters is based on at least a load of the plurality of clusters. (fig. 2, 210 and 215, wherein the number of computing instances is adjusted based on the assigned workload)
 Soundararajan ¶ 59 discloses the decision for assigning a task to processing nodes “may be based on other task assignments, processing capability, expected downtime, desired end states of the individual memory caches, or other factors”. Soundararajan, ¶ 73, also discloses that the “cache virtual machines include a stateless caching tier which is optimized for performance of the processing activities”. It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to combine the applied references for disclosing executing the plurality of tasks with the plurality of data nodes and a number of clusters in the plurality of clusters is based on at least a load of the plurality of clusters because doing so would increase usability of Soundararajan by providing for adjusting the number of processing nodes by adding or removing servers/virtual machines at any given time under varying load and system conditions.

Claim 11.	Soundararajan teaches:
A system comprising: a memory; and a processor, operatively coupled to the memory, the processor to: (¶ 10, an analytics system comprises a processing module, an application-level driver, a distributed data cache, a cache coordinator, and an input/output (I/O) coordinator)
identify, based on a received plurality of queries, a plurality of tasks for processing the received plurality of queries, wherein the plurality of queries is directed to a plurality of files stored across a plurality of remote storage devices, the plurality of tasks to be processed by a plurality of clusters, wherein each of the plurality of clusters includes multiple ones of a plurality of data nodes, and each of plurality of data nodes includes a central processing unit (CPU) and a cache memory; (a job manager (¶¶ 48, 57-58) divides received jobs/queries to a plurality of tasks and distributes the tasks to a plurality of  data analytics processing nodes/clusters; each data analytics processing nodes includes multiple processors and memory caches (¶¶ 30-31);  each data analytics processing node executes an assigned task using necessary data in its cache (¶¶ 40-41) or in storage server (¶¶ 32, 42))   
reference a metadata store to determine whether a file of the plurality of files associated with a task of the plurality of tasks is cached at least in part by the cache memory of one or more data nodes of an execution platform comprising a plurality of clusters; (the job manger (¶¶ 48, 57-58) distributes tasks to data analytics processing nodes based on whether the data needed to perform the tasks, or a portion of the data, is already present in the associated memory caches using information describing the contents of memory caches in metadata (¶¶ 50, 57-58))
in response to determining that the file is cached at least in part by one or more data nodes, assign processing of the task to a data node of the one or more data nodes that has cached at least a part of the file; (the job manger (¶¶ 48, 57, 58) distributes tasks to data analytics processing nodes based on whether the data needed to perform the tasks, or a portion of the data, is already present in the associated memory caches using information describing the contents of memory caches in metadata (¶¶ 50-57-58))
wherein, in response to determining that the file is not cached entirely by the data node, retrieve, by the data node, one or more parts of the file that are not cached from one or more remote storage devices of the plurality of remote storage devices; (the job manger (¶¶ 48, 57-58) distributes tasks to data analytics processing nodes based on whether the data needed to perform the tasks, or a portion of the data, is already present in the associated memory caches using information describing the contents of memory caches in metadata (¶¶ 50, 57-58); a portion of the data not present in the associated memory caches is retrieved from storage server (¶¶ 56, 62))
execute, by the plurality of data nodes, the plurality of tasks. (¶ 67, tasks are performed/executed by the assigned data analytics processing nodes)
Soundararajan did not specifically teach:
a number of clusters in the plurality of clusters is based on at least a load of the plurality of clusters.
Jain discloses:
executing the plurality of tasks with the plurality of data nodes and a number of clusters in the plurality of clusters is based on at least a load of the plurality of clusters. (fig. 2, 210 and 215, wherein the number of computing instances is adjusted based on the assigned workload)
 Soundararajan ¶ 59 discloses the decision for assigning a task to processing nodes “may be based on other task assignments, processing capability, expected downtime, desired end states of the individual memory caches, or other factors”. Soundararajan, ¶ 73, also discloses that the “cache virtual machines include a stateless caching tier which is optimized for performance of the processing activities”. It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to combine the applied references for disclosing executing the plurality of tasks with the plurality of data nodes and a number of clusters in the plurality of clusters is based on at least a load of the plurality of clusters because doing so would increase usability of Soundararajan by providing for adjusting the number of processing nodes by adding or removing servers/virtual machines at any given time under varying load and system conditions.

Claim 21.	A non-transitory computer-readable medium having instructions stored thereon which, when executed by a processor, cause the processor to:
identify, based on a received plurality of queries, a plurality of tasks for processing the received plurality of queries, wherein the plurality of queries is directed to a plurality of files stored across a plurality of remote storage devices, the plurality of tasks to be processed by a plurality of clusters, wherein each of the plurality of clusters includes multiple ones of a plurality of data nodes, and each of plurality of data nodes includes a central processing unit (CPU) and a cache memory; (a job manager (¶¶ 48, 57-58) divides received jobs/queries to a plurality of tasks and distributes the tasks to a plurality of  data analytics processing nodes/clusters; each data analytics processing nodes includes multiple processors and memory caches (¶¶ 30-31);  each data analytics processing node executes an assigned task using necessary data in its cache (¶¶ 40-41) or in storage server (¶¶ 32, 42))   
reference a metadata store to determine whether a file of the plurality of files associated with a task of the plurality of tasks is cached at least in part by the cache memory of one or more data nodes of an execution platform comprising a plurality of clusters; (the job manger (¶¶ 48, 57-58) distributes tasks to data analytics processing nodes based on whether the data needed to perform the tasks, or a portion of the data, is already present in the associated memory caches using information describing the contents of memory caches in metadata (¶¶ 50, 57-58))
in response to determining that the file is cached at least in part by one or more data nodes, assign processing of the task to a data node of the one or more data nodes that has cached at least a part of the file; (the job manger (¶¶ 48, 57-58) distributes tasks to data analytics processing nodes based on whether the data needed to perform the tasks, or a portion of the data, is already present in the associated memory caches using information describing the contents of memory caches in metadata (¶¶ 50-57-58))
in response to determining that the file is not cached entirely by the data node, retrieve, by the processor, one or more parts of the file that are not cached from one or more remote storage devices of the plurality of remote storage devices; and (the job manger (¶¶ 48, 57-58) distributes tasks to data analytics processing nodes based on whether the data needed to perform the tasks, or a portion of the data, is already present in the associated memory caches using information describing the contents of memory caches in metadata (¶¶ 50, 57-58); a portion of the data not present in the associated memory caches is retrieved from storage server (¶¶ 56, 62))
executing the plurality of tasks with the plurality of data nodes. (¶ 67, tasks are performed/executed by the assigned data analytics processing nodes)
Soundararajan did not specifically teach:
a number of clusters in the plurality of clusters is based on at least a load of the plurality of clusters.
Jain discloses:
executing the plurality of tasks with the plurality of data nodes and a number of clusters in the plurality of clusters is based on at least a load of the plurality of clusters. (fig. 2, 210 and 215, wherein the number of computing instances is adjusted based on the assigned workload)
 Soundararajan ¶ 59 discloses the decision for assigning a task to processing nodes “may be based on other task assignments, processing capability, expected downtime, desired end states of the individual memory caches, or other factors”. Soundararajan, ¶ 73, also discloses that the “cache virtual machines include a stateless caching tier which is optimized for performance of the processing activities”. It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to combine the applied references for disclosing executing the plurality of tasks with the plurality of data nodes and a number of clusters in the plurality of clusters is based on at least a load of the plurality of clusters because doing so would increase usability of Soundararajan by providing for adjusting the number of processing nodes by adding or removing servers/virtual machines at any given time under varying load and system conditions.

Claim 2.	The method of claim 1, further comprising writing the one or more parts to a cache memory of the data node. (Soundararajan, ¶¶ 61-62, data chunks A2 and B2 are written to memory cache 614)
Claims 12 and 22 are rejected under the same rationale as claim 2.

Claims 3, 13 and 23.	(Canceled)

Claim 4.	The method of claim 1, wherein the metadata store comprises metadata indicating an organization of the plurality of files across the plurality of remote storage devices and the plurality of cache memories. (Soundararajan, ¶¶ 50, 58 metadata includes information describing the contents of memory caches)
Claims 14 and 24 are rejected under the same rationale as claim 4.

Claim 5.	The method of claim 4, further comprising:
updating the metadata of the metadata store to indicate that the file is completely stored in the cache memory of the data node. (Soundararajan, ¶¶ 50 and 66, metadata in fig. 5, 544 and fig. 6B, 644 include updated location information)
Claims 15 and 25 are rejected under the same rationale as claim 4.

Claims 7, 17 and 27.	(Canceled)

Claim 8.	The method of claim 3, further comprising processing the task using the data node. (Soundararajan, ¶¶ 40-42, analytic nodes perform the assigned tasks)
Claims 18 and 28 are rejected under the same rationale as claim 8.

Claim 9.	The method of claim 1, wherein the execution platform is separate and independent of the plurality of remote storage devices. (Soundararajan, fig. 1, the remote storage server 150 is separate and independent of data analytics processing nodes 110 and 120)
Claims 19 and 29 are rejected under the same rationale as claim 9.

Claim 10.	The method of claim 1, wherein one or more of the plurality of files stored across the plurality of remote storage devices are stored in a cache of multiple data nodes of the plurality of data nodes of the execution platform at one point in time. (Soundararajan, ¶ 5, data is replicated across nodes to satisfy data reliability and availability; ¶ 56, fig. 6B, cache coordinators copy data from storage devices to cache devices as needed at one point in time to satisfy a task requirement)
Claims 20 and 30 are rejected under the same rationale as claim 9.

Claims 6, 16 and 26 are rejected under 35 U.S.C. 103(a) as being unpatentable over Soundararajan and Jain as applied to claims 1, 11 and 21 above in view of Loaiza et al. Pub. No.: US 2014/0281247 (Loaiza).

Claim 6.	Soundararajan as modified taught the method of claim 1 wherein files stored in remote storage devices are cached in cache devices as illustrated in fig 1; Soundararajan did not specifically teach wherein the plurality of files is stored across the plurality of remote storage devices in a columnar format, and parts of the file that are cached by the data node comprises columns of the file that are frequently accessed. However, Loaiza teaches the feature of storing data in a columnar format and caching columns of the file that are frequently accessed in ¶¶ 30, 60, wherein data are stored in primary storage devices and cache devices in “row-major format, column-major format, or hybrid-columnar format, or any other data block format” and “if the requested data happened to be requested quite frequently, then the requested data may have been already retrieved from a primary storage device, rewritten in rewritten format into rewritten data, and stored in the persistent cache device in the rewritten format”.
Both Soundararajan as modified and Loaiza cache requested data from a storage device; it  would have been obvious before the effective filling date of the claimed invention to a person having ordinary skill in the art to combine them for disclosing wherein the plurality of files is stored across the plurality of remote storage devices in a columnar format, and parts of the file that are cached by the data node comprises columns of the file that are frequently accessed because doing so would increase usability of Soundararajan by providing for storing data in any major format and further caching more frequently requested data for performing data operation faster. 

Response to Amendment and Arguments
The terminal disclaimers submitted by Applicant has been approved on 04/26/2021; double patenting rejections are withdrawn.
In light of Applicant’s interpretations based on Spec., ¶¶ 66, 67, and 74 and FIGURE 3, 112(a) rejections are withdrawn.
Applicant’s arguments with respect to amended claims have been considered but are moot in view of the new ground of rejections as provided above.

Conclusion
Applicant’s amendment necessitated the new ground(s) of rejection presented in this office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Mohsen Almani whose telephone number is (571)270-7722.  The examiner can normally be reached on M-F, 9 AM-5 PM, ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela Reyes can be reached on 571-270-1006.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/MOHSEN ALMANI/Primary Examiner, Art Unit 2159