Detailed Action
This action is based on Applicant's remarks/arguments received on 08/24/2022. Applicant amended claims 1, 8, 11, 18, 21 and 28 and presented claims  1-2, 4-6, 8-12, 14-16, 18-22, 24-26, and 28-30 for examination.

Claim Objections
Claims 1, 11 and 28 are objected to because of the following reason.
Amended claim 1 recites:
“changing a total number of clusters in the plurality of clusters using a load of the plurality of clusters due to the processing of the plurality of query tasks by the plurality of clusters, wherein a dynamic change is at least one of creating a new cluster or deleting an existing cluster and the change in the total number of cluster are made in a unit of the multiple ones of the first plurality of processors and the corresponding cache memory”.
The following is the previous version of the claim as rejected in the previous office action: 
 “dynamically changing a total number of clusters in the plurality of clusters based on a load of the plurality of clusters due to the processing of the plurality of tasks by the plurality of clusters”.
The amended feature as presented is not related to the current application.(Apparently, it related to Application 16/860, 976 which is similar to the current application.) 
For the purpose of the examination, the Examiner assumes that Applicant intended to amend the claim as follows:
dynamically changing a total number of clusters in the plurality of clusters based on a load of the plurality of clusters due to the processing of the plurality of query tasks by the plurality of clusters .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.

Claims 1, 2, 4, 5, 8-12, 14, 15, 18-22, 24, 25 and 28-30 are rejected under 35 U.S.C. 103(a) as being unpatentable over Soundararajan et al., Pub. No.: US 2013/0132967 (Soundararajan) in view of Chen et al., “Integration of Workflow Partitioning and Resource Provisioning” (Chen). 

Claim 1.	Soundararajan teaches:
A method comprising:
identifying, based on a received plurality of queries, a plurality of query tasks for processing the received plurality of queries, wherein the plurality of queries is directed to a plurality of files stored across a plurality of remote storage devices, (¶¶ 36, 40, FIGs. 1, and 5-7, a received analytic job/query is broken down into a plurality of quarry tasks directed to data located in the memory of data analytics processing nodes and storage server of FIGS. 1, and 5-7)
the plurality of query tasks to be processed by a plurality of processing nodes, wherein each of the plurality of processing nodes includes a data node and each data node includes a central processing unit (CPU) and a cache memory; (¶¶ 30-31, data analytic processing nodes include data nodes, e.g., processors/CPUs and caches; ¶¶ 36, 40, FIGs. 1, and 5-7, a received analytic job/query is broken down into a plurality of quarry tasks to be processed by data analytics processing nodes)
referencing a metadata store to determine whether a file of the plurality of files associated with a query task of the plurality of query tasks is cached at least in part by the cache memory of one or more data nodes of an execution platform comprising a plurality of processing nodes; (¶¶ 50 and 66, metadata/location information in FIGs. 5-6B is referenced for assigning tasks to data analytics processing nodes; metadata is separate from data analytics processing nodes; ¶¶ 30-31, data analytic processing nodes include data nodes, e.g., processors/CPUs and caches)
in response to determining that the file is cached at least in part by one or more data nodes, assigning processing of the query task to a data node of the one or more data nodes that has cached at least a part of the file; (¶¶ 30-31, 50 and 66, metadata/location information in FIGs. 5-6B is referenced for assigning tasks to data analytics processing nodes including data nodes , e.g., processors/CPUs and caches where a requested data block is located)
in response to determining that the file is not cached entirely by the data node, retrieving, by a processor, one or more parts of the file that are not cached by the data node from one or more remote storage devices of the plurality of remote storage devices; and (¶¶ 30-31, 50 and 66, metadata/location information in FIGs. 5-6B is referenced for assigning tasks to data analytics processing nodes including multiple execution nodes, e.g., processors/CPUs and caches where a requested data block is located)
executing the plurality of query tasks with the plurality of data nodes of the plurality of processing nodes. (¶ 67, tasks are performed/executed by the assigned data analytics processing nodes)
Soundararajan did not specifically teach the plurality of processing nodes and data nodes as:
a plurality of clusters, wherein each of the plurality of clusters includes multiple ones of a plurality of data nodes, and each of plurality of data nodes includes a central processing unit (CPU) and a cache memory;  
dynamically change a total number of clusters in the plurality of clusters based on a load of the plurality of clusters due to the processing of the plurality of query tasks by the plurality of clusters.
Chen explicitly teaches:
a plurality of clusters, wherein each of the plurality of clusters includes multiple ones of a plurality of data nodes, and each of plurality of data nodes includes a central processing unit (CPU) and a cache memory; (wherein multiple virtual clusters each including multiple virtual machines/data nodes and wherein each multiple virtual machines is provisioned for using CPU, memory, disk, etc.: p.1, sec. I, left col. “workflow partitioning [5] is an approach to divide a workflow into several sub-workflows and then submit these sub-workflows to different execution sites (virtual clusters)…we are able to dynamically allocate resources into multiple execution sites or virtual clusters [7] and then execute the sub-workflows on these sites”; p.4, sec. B, “The XML format describes virtual clusters as a collection of several nodes, which correspond to virtual machines. Each node is defined with the characteristics of the virtual machine to be provisioned, such as the VM image to use and the hardware resource type (CPU, memory, disk, etc.)”)
dynamically change a total number of clusters in the plurality of clusters based on a load of the plurality of clusters due to the processing of the plurality of query tasks by the plurality of clusters. (p.1, sec. I, left col.; p. 2, right col., upper half; p.4, sec. B, wherein based on a specific workload description/query a workload is broken down to sub-workflow/tasks and wherein  a total cluster is allocated and destroyed dynamically for execution of the tasks: “we are able to dynamically allocate resources into multiple execution sites or virtual clusters [7] and then execute the sub-workflows on these sites… the dynamic virtual cluster provisioning would destroy virtual clusters once they are no longer needed for any upcoming sub-workflows so as to save costs”.  
Soundararajan discloses in ¶¶ 30-31 that each data analytics processing node includes multiple processers and caches and in ¶ 70 that each data analytics processing node is virtualized including one or more virtual machines. It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to combine the applied references for including a plurality of clusters, wherein each of the plurality of clusters includes multiple ones of a plurality of data nodes, and each of plurality of data nodes includes a central processing unit (CPU) and a cache memory; dynamically change a total number of clusters in the plurality of clusters based on a load of the plurality of clusters due to the processing of the plurality of query tasks by the plurality of clusters as taught by Chen into  distributed data analytics system as taught by Soundararajan because doing so would further improve the system by creating a virtual cluster as a data analytics processing node when needed and destroying it when there is no task left for processing to reduce the workflow makespan and resource cost.

Claim 11.	Soundararajan teaches:
A system comprising: a memory; and a processor, operatively coupled to the memory, the processor to: (¶ 10, an analytics system comprises a processing module, an application-level driver, a distributed data cache, a cache coordinator, and an input/output (I/O) coordinator)
identify, based on a received plurality of queries, a plurality of query tasks for processing the received plurality of queries, wherein the plurality of queries is directed to a plurality of files stored across a plurality of remote storage devices,  (¶¶ 36, 40, FIGs. 1, and 5-7, a received analytic job/query is directed to data located in the memory of data analytics processing nodes and storage server of FIGS. 1, and 5-7)
the plurality of query tasks to be processed by a plurality of processing nodes, wherein each of the plurality of processing nodes includes a data node and each data node includes a central processing unit (CPU) and a cache memory; (¶¶ 30-31, data analytic processing nodes include data nodes, e.g., processors/CPUs and caches; ¶¶ 36, 40, FIGs. 1, and 5-7, a received analytic job/query is to be processed by data analytics processing nodes)
reference a metadata store to determine whether a file of the plurality of files associated with a query task of the plurality of query tasks is cached at least in part by the cache memory of one or more data nodes of an execution platform comprising a plurality of processing nodes; (¶¶ 50 and 66, metadata/location information in FIGs. 5-6B is referenced for assigning tasks to data analytics processing nodes; metadata is separate from data analytics processing nodes; ¶¶ 30-31, data analytic processing nodes include data nodes, e.g., processors/CPUs and caches)
in response to determining that the file is cached at least in part by one or more data nodes, assign processing of the query task to a data node of the one or more data nodes that has cached at least a part of the file; (¶¶ 30-31, 50 and 66, metadata/location information in FIGs. 5-6B is referenced for assigning tasks to data analytics processing nodes including data nodes , e.g., processors/CPUs and caches where a requested data block is located)
wherein, in response to determining that the file is not cached entirely by the data node, retrieve, by the data node, one or more parts of the file that are not cached from one or more remote storage devices of the plurality of remote storage devices; (¶¶ 30-31, 50 and 66, metadata/location information in FIGs. 5-6B is referenced for assigning tasks to data analytics processing nodes including multiple execution nodes, e.g., processors/CPUs and caches where a requested data block is located)
execute, by the plurality of data nodes of the plurality of processing nodes, the plurality of query tasks. (¶ 67, tasks are performed/executed by the assigned data analytics processing nodes)
Soundararajan did not specifically teach the plurality of processing nodes and data nodes as:
a plurality of clusters, wherein each of the plurality of clusters includes multiple ones of a plurality of data nodes, and each of plurality of data nodes includes a central processing unit (CPU) and a cache memory;  
dynamically changing a total number of clusters in the plurality of clusters based on a load of the plurality of clusters due to the processing of the plurality of query tasks by the plurality of clusters.
Chen explicitly teaches:
a plurality of clusters, wherein each of the plurality of clusters includes multiple ones of a plurality of data nodes, and each of plurality of data nodes includes a central processing unit (CPU) and a cache memory; (wherein multiple virtual clusters each including multiple virtual machines/data nodes and wherein each multiple virtual machines is provisioned for using CPU, memory, disk, etc.: p.1, sec. I, left col. “workflow partitioning [5] is an approach to divide a workflow into several sub-workflows and then submit these sub-workflows to different execution sites (virtual clusters)…we are able to dynamically allocate resources into multiple execution sites or virtual clusters [7] and then execute the sub-workflows on these sites”; p.4, sec. B, “The XML format describes virtual clusters as a collection of several nodes, which correspond to virtual machines. Each node is defined with the characteristics of the virtual machine to be provisioned, such as the VM image to use and the hardware resource type (CPU, memory, disk, etc.)”)
dynamically changing a total number of clusters in the plurality of clusters based on a load of the plurality of clusters due to the processing of the plurality of query tasks by the plurality of clusters. (p.1, sec. I, left col.; p.4, sec. B, wherein based on a specific workload, a total cluster is allocated and destroyed: “we are able to dynamically allocate resources into multiple execution sites or virtual clusters [7] and then execute the sub-workflows on these sites… the dynamic virtual cluster provisioning would destroy virtual clusters once they are no longer needed for any upcoming sub-workflows so as to save costs”  
Soundararajan discloses in ¶¶ 30-31 that each data analytics processing node includes multiple processers and caches and in ¶ 70 that each data analytics processing node is virtualized including one or more virtual machines. It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to combine the applied references for including a plurality of clusters, wherein each of the plurality of clusters includes multiple ones of a plurality of data nodes, and each of plurality of data nodes includes a central processing unit (CPU) and a cache memory; dynamically changing a total number of clusters in the plurality of clusters based on a load of the plurality of clusters due to the processing of the plurality of tasks by the plurality of clusters as taught by Chen into  distributed data analytics system as taught by Soundararajan because doing so would further improve the system by creating a virtual cluster as a data analytics processing node when needed and destroying it when there is no task left for processing to reduce the workflow makespan and resource cost.

Claim 21.	Soundararajan teaches:
A non-transitory computer-readable medium having instructions stored thereon which, when executed by a processor, cause the processor to:
identify, based on a received plurality of queries, a plurality of query tasks for processing the received plurality of queries, wherein the plurality of queries is directed to a plurality of files stored across a plurality of remote storage devices, (¶¶ 36, 40, FIGs. 1, and 5-7, a received analytic job/query is directed to data located in the memory of data analytics processing nodes and storage server of FIGS. 1, and 5-7)
the plurality of query tasks to be processed by a plurality of processing nodes, wherein each of the plurality of processing nodes includes a data node and each data node includes a central processing unit (CPU) and a cache memory; (¶¶ 30-31, data analytic processing nodes include data nodes, e.g., processors/CPUs and caches; ¶¶ 36, 40, FIGs. 1, and 5-7, a received analytic job/query is to be processed by data analytics processing nodes)
reference a metadata store to determine whether a file of the plurality of files associated with a query task of the plurality of query tasks is cached at least in part by the cache memory of one or more data nodes of an execution platform comprising a plurality of processing nodes; (¶¶ 50 and 66, metadata/location information in FIGs. 5-6B is referenced for assigning tasks to data analytics processing nodes; metadata is separate from data analytics processing nodes; ¶¶ 30-31, data analytic processing nodes include data nodes, e.g., processors/CPUs and caches)
in response to determining that the file is cached at least in part by one or more data nodes, assign processing of the query task to a data node of the one or more data nodes that has cached at least a part of the file; (¶¶ 30-31, 50 and 66, metadata/location information in FIGs. 5-6B is referenced for assigning tasks to data analytics processing nodes including data nodes , e.g., processors/CPUs and caches where a requested data block is located)
in response to determining that the file is not cached entirely by the data node, retrieve, by the data node, one or more parts of the file that are not cached from one or more remote storage devices of the plurality of remote storage devices; (¶¶ 30-31, 50 and 66, metadata/location information in FIGs. 5-6B is referenced for assigning tasks to data analytics processing nodes including multiple execution nodes, e.g., processors/CPUs and caches where a requested data block is located)
executing the plurality of query tasks with the plurality of data nodes of the plurality of processing nodes. (¶ 67, tasks are performed/executed by the assigned data analytics processing nodes)
Soundararajan did not specifically teach the plurality of processing nodes and data nodes as:
a plurality of clusters, wherein each of the plurality of clusters includes multiple ones of a plurality of data nodes, and each of plurality of data nodes includes a central processing unit (CPU) and a cache memory;  
dynamically changing a total number of clusters in the plurality of clusters based on a load of the plurality of clusters due to the processing of the plurality of query tasks by the plurality of clusters.
Chen explicitly teaches:
a plurality of clusters, wherein each of the plurality of clusters includes multiple ones of a plurality of data nodes, and each of plurality of data nodes includes a central processing unit (CPU) and a cache memory; (wherein multiple virtual clusters each including multiple virtual machines/data nodes and wherein each multiple virtual machines is provisioned for using CPU, memory, disk, etc.: p.1, sec. I, left col. “workflow partitioning [5] is an approach to divide a workflow into several sub-workflows and then submit these sub-workflows to different execution sites (virtual clusters)…we are able to dynamically allocate resources into multiple execution sites or virtual clusters [7] and then execute the sub-workflows on these sites”; p.4, sec. B, “The XML format describes virtual clusters as a collection of several nodes, which correspond to virtual machines. Each node is defined with the characteristics of the virtual machine to be provisioned, such as the VM image to use and the hardware resource type (CPU, memory, disk, etc.)”)
dynamically changing a total number of clusters in the plurality of clusters based on a load of the plurality of clusters due to the processing of the plurality of query tasks by the plurality of clusters. (p.1, sec. I, left col.; p.4, sec. B, wherein based on a specific workload, a total cluster is allocated and destroyed: “we are able to dynamically allocate resources into multiple execution sites or virtual clusters [7] and then execute the sub-workflows on these sites… the dynamic virtual cluster provisioning would destroy virtual clusters once they are no longer needed for any upcoming sub-workflows so as to save costs”  
Soundararajan discloses in ¶¶ 30-31 that each data analytics processing node includes multiple processers and caches and in ¶ 70 that each data analytics processing node is virtualized including one or more virtual machines. It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to combine the applied references for including a plurality of clusters, wherein each of the plurality of clusters includes multiple ones of a plurality of data nodes, and each of plurality of data nodes includes a central processing unit (CPU) and a cache memory; dynamically changing a total number of clusters in the plurality of clusters based on a load of the plurality of clusters due to the processing of the plurality of query tasks by the plurality of clusters as taught by Chen into  distributed data analytics system as taught by Soundararajan because doing so would further improve the system by creating a virtual cluster as a data analytics processing node when needed and destroying it when there is no task left for processing to reduce the workflow makespan and resource cost.

Claim 2.	The method of claim 1, further comprising writing the one or more parts to a cache memory of the data node. (Soundararajan, ¶¶ 61-62, data chunks A2 and B2 are written to memory cache 614)
Claims 12 and 22 are rejected under the same rationale as claim 2.

Claims 3, 13 and 23.	(Canceled)

Claim 4.	The method of claim 1, wherein the metadata store comprises metadata indicating an organization of the plurality of files across the plurality of remote storage devices and the plurality of cache memories. (Soundararajan, ¶¶ 50, 58 metadata includes information describing the contents of memory caches)
Claims 14 and 24 are rejected under the same rationale as claim 4.

Claim 5.	The method of claim 4, further comprising:
updating the metadata of the metadata store to indicate that the file is completely stored in the cache memory of the data node. (Soundararajan, ¶¶ 50 and 66, metadata in fig. 5, 544 and fig. 6B, 644 include updated location information)
Claims 15 and 25 are rejected under the same rationale as claim 4.

Claims 7, 17 and 27.	(Canceled)

Claim 8.	The method of claim 1, further comprising processing the query task using the data node. (Soundararajan, ¶¶ 40-42, analytic nodes perform the assigned tasks)
Claims 18 and 28 are rejected under the same rationale as claim 8.

Claim 9.	The method of claim 1, wherein the execution platform is separate and independent of the plurality of remote storage devices. (Soundararajan, FIG. 1, the remote storage server 150 is separate and independent of data analytics processing nodes 110 and 120)
Claims 19 and 29 are rejected under the same rationale as claim 9.

Claim 10.	The method of claim 1, wherein one or more of the plurality of files stored across the plurality of remote storage devices are stored in a cache of multiple data nodes of the plurality of data nodes of the execution platform at one point in time. (Soundararajan, ¶ 5, data is replicated across nodes to satisfy data reliability and availability; ¶ 56, fig. 6B, cache coordinators copy data from storage devices to cache devices as needed at one point in time to satisfy a task requirement)
Claims 20 and 30 are rejected under the same rationale as claim 10.

Claims 6, 16 and 26 are rejected under 35 U.S.C. 103(a) as being unpatentable over Soundararajan and Chen as applied to claims 1, 11 and 21 above in view of Loaiza et al. Pub. No.: US 2014/0281247 (Loaiza).

Claim 6.	Soundararajan as modified taught the method of claim 1 wherein files stored in remote storage devices are cached in cache devices as illustrated in FIG. 1; Soundararajan did not specifically teach wherein the plurality of files is stored across the plurality of remote storage devices in a columnar format, and parts of the file that are cached by the data node comprises columns of the file that are frequently accessed. However, Loaiza teaches the feature of storing data in a columnar format and caching columns of the file that are frequently accessed in ¶¶ 30, 60, wherein data are stored in primary storage devices and cache devices in “row-major format, column-major format, or hybrid-columnar format, or any other data block format” and “if the requested data happened to be requested quite frequently, then the requested data may have been already retrieved from a primary storage device, rewritten in rewritten format into rewritten data, and stored in the persistent cache device in the rewritten format”.
It would have been obvious before the effective filling date of the claimed invention to a person having ordinary skill in the art to combine the applied references for disclosing wherein the plurality of files is stored across the plurality of remote storage devices in a columnar format, and parts of the file that are cached by the data node comprises columns of the file that are frequently accessed because doing so would increase usability of Soundararajan as modified by providing for storing data in any major format and further caching more frequently requested data for performing data operation faster. 
Claims 16 and 26 are rejected under the same rationale as claim 6.

Response to Amendment and Arguments
Applicant’s arguments with respect to rejected claims have been considered but are not persuasive for at least the following reason.
Applicant argues “Chen's workflows (and sub-workflows) are directed to scientific workflows, not workflows related to queries. For example, Chen discloses that workflows can be for astronomy, seismology, or genomics. Furthermore, applicant respectfully submits that Chen is silent as to processing queries as claimed. Thus, because Chen is directed to managing scientific workflow and is silent to processing queries, applicant respectfully submits that Chen does not teach or suggest the claim feature.” Remarks, 10.
In response:
A query is a request for data. This query/request is expressed as an analytics job which is broken apart into individual tasks to be executed by processing nodes. Soundararajan ¶¶ 57-58. 
A scientific workflow in Chen is equivalent to an analytics job in Soundararajan because a scientific workflow is expressed as a sequence of jobs for performing tasks as requested in a workflow description. Chen, p. 1, left col., upper half, p. 2, right col., upper half.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Mohsen Almani whose telephone number is (571)270-7722.  The examiner can normally be reached on M-F, 9 AM-5 PM, ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela Reyes can be reached on 571-270-1006.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/MOHSEN ALMANI/Primary Examiner, Art Unit 2159