DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This is response to application filed 04/20/2020.

Status of the claims
Claims 1-20 are currently pending for examination.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 05/28/2020 is being considered by the examiner.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Kornacker et al. (US 20140280032, hereafter Kornacker) in view of Eshleman et al. (US 20130013552, hereafter Eshleman).

Regarding claim 1, Kornacker discloses:  A method performed by one or more computers, the method comprising:
 receiving, by the one or more computers, a request for data (Kornacker [0025] discloses: received query from the client); 
in response to the request, providing, by the one or more computers, instructions for processing data stored by a data source having an associated cluster of data nodes configured to retrieve data of the (Kornacker [0059; 0060] discloses: a computer system within which a set of instruction …; [0038] discloses: the query coordinator determines which nodes in the cluster should receive the query plan fragments for execution, the query coordinator distributes the query plan fragments to the nodes having relevant data to initiate execution of the plan fragments against the data local to each node);
 processing, by the one or more computers, the data of the data source according to the instructions, wherein the data nodes perform the processing in parallel on different portions of the data of the data source (Kornacker [0045] discloses: the query coordinator 516b hands off the tasks or plan fragments from the query planner 514b to the query execution engines 518a-c of each of the nodes that hold data relevant to the query request. All three query execution engines run in parallel and distributed fashion);
providing, by the one or more computers, a response to the request for data based on the processed data in the in-memory cache (Kornacker [0046] discloses: results from the query executions engines are passed to the query coordinator 516b via in memory transfers…, and the final result is aggregated at the query coordinator 516b. Keeping query results or intermediate results in memory provides performance improvement). 
Kornacker didn’t disclose, but Eshleman discloses:  loading, by the one or more computers, the processed data from the data nodes into an in-memory cache (Eshleman [0078; 0100] discloses: load reporting data in-memory based upon descriptions of the raw data and report specifications provided by users).
Kornacker and Eshleman are analogous art because they are) in the same field of endeavor, aggregation processes. It would have been obvious to one of ordinary skill in the art, at the time of filling, to modify Kornacker, to include the teaching of Eshleman, in order to providing aggregation process for distributed computing system. The suggestion/motivation to combine is for dynamically build the stages of a multi-stage data pipeline to load data of interest into system memory based on the desired end-consumption of the data and to provide access to desired information (Eshleman [0012; 0013]).


 

Regarding claim 2, Kornacker as modified discloses: The method of claim 1, wherein the data source is a Hadoop distributed file system (HDFS) or a column-oriented database management system for the HDFS (Kornacker [0024] discloses: HDFS). 
Regarding claim 3, Kornacker as modified discloses:  The method of claim 1, wherein the data source that provides SQL interaction for Hadoop data storage (Kornacker [0024] discloses: SQL application such as Hue, provide a user interface for Hadoop to run queries). 
Regarding claim 4, Kornacker as modified discloses:  The method of claim 1, wherein the data source is a web service, a search server, a relational database management system (RDBMS), a streaming source, or a NoSQL database (Kornacker [0024] discloses: web application, query engine, a Hadoop cluster; [0054] discloses: RDBMS ). 
Regarding claim 5, Kornacker as modified discloses: The method of claim 1, wherein the instructions comprise instructions for the data nodes associated with the data source to perform operations on the data of the data source, wherein the operations comprise one or more of data filtering, data aggregation, data wrangling, searching, data mining, text analytics, on-demand loading, incremental refreshing, data streaming, data blending, an extract-transform-load (ETL) workflow, or multi-sourcing  (Kornacker [0046] discloses: queries operations such as TopN, aggregation, data streamed between the query engine nodes for preaggregation; [0054] discloses: query parsing, load the new data, rad, or recognize.; Eshleman [0079] discloses: ETL process and a data wrangling process.). 
Regarding claim 6, Kornacker as modified discloses:  The method of claim 1, wherein the instructions instruct the data nodes to perform one or more data analytics operations on data of the data (Eshleman [0090; 0095] discloses: the process for retrieving raw data); 
wherein processing the data of the data source according to the instructions comprises: performing, by each processing node of the multiple nodes, the one or more data analytics operations on the portion of the data of the data source that corresponds to the processing node (Eshleman [0070] discloses: distributed processing of data sets across clusters of nodes. Hadoop is designed to scale from one to thousands of nodes, where each node is a computer responsible for its own processing and storage of data…, the distributed computing platform 110 is configured to run jobs generated by the interest-driven BI system utilizing Hadoop MapReduce and queries utilizing Hive); and
 providing, by the multiple nodes, results of performing the one or more data analytics operations on the respective portions of the data of the data source (Eshleman [0081] discloses: A process 429 is applied to the aggregate data 424 to populate the schema 428 to provide reporting data 430 that can be loaded in-memory and used in the interactive generation of reports by users to facilitate the visualization and exploration of the data). 
Regarding claim 7, Kornacker as modified discloses: The method of claim 1, comprising: identifying processing to perform for data of the data source; and determining (i) a first portion of the processing to be performed by the data nodes associated with the data source (Kornacker [0045] discloses the query coordinator 516b hands off the tasks or plan fragments from the query planner 514b to the query execution engines 518a-c of each of the nodes that hold data relevant to the query request. All three query execution engines run in parallel and distributed fashion) and (ii) a second portion of the processing to be performed by a data analytics engine (Kornacker [0044] discloses: analyze the query request to determine tasks that can be distributed across the low latency (LL) query engine daemons in the cluster); wherein the instructions instruct the data nodes to perform the first portion of the processing (Kornacker [0059] discloses: a set of instructions for perform tasks); 
wherein the method comprises: receiving, from the data nodes, data including results of the first portion of the processing (Kornacker [0038] discloses: query coordinator receives or obtains membership information from the state store and location information from the name node (for HDFS query) at block 408. Using the membership information and the block location information, the query coordinator determines which daemons or nodes in the cluster should receive the query plan fragments for execution); and
 performing the second portion of the processing using the data analytics engine (Kornacker [0044] discloses: analyze the query request to determine tasks that can be distributed across the low latency (LL) query engine daemons in the cluster); 
wherein loading the processed data comprises loading data generated by performing the second processing using the data analytics engine (Eshleman [0078; 0100] discloses: load reporting data in-memory based upon descriptions of the raw data and report specifications provided by users). 
Regarding claim 8, Kornacker as modified discloses: The method of claim 1, wherein the request for data is a request for data to display in a dashboard interface (Eshleman [0064] discloses: The interest-driven data pipeline then fillers and/or aggregates the source data based upon a schema to create reporting data). 
Regarding claim 9, Kornacker as modified discloses: The method of claim 1, wherein the request for data comprises a query (Kornacker [0025] discloses: received query from the client). 
Regarding claim 10, Kornacker as modified discloses:  The method of claim 1, comprising: receiving a first query (Kornacker [0025] discloses: received query from the client);  
dividing the query into multiple queries (Kornacker [0025] discloses: divide a query into fragments which are distributed among remote nodes running an instance of the low latency (LL) query engine for execution in parallel);
 assigning each of the multiple queries to be processed in parallel by separate processing units using the in-memory cache (Kornacker [0025] discloses: divide a query into fragments which are distributed among remote nodes running an instance of the low latency (LL) query engine for execution in parallel); 
(Kornacker [0045] discloses: the query coordinator 516b hands off the tasks or plan fragments from the query planner 514b to the query execution engines 518a-c of each of the nodes that hold data relevant to the query request); and 
generating a response to the first query based on the results of the multiple queries (Kornacker [0045] discloses: the query execution engines 518a-c execute the plan fragments locally on the nodes that hold the relevant data);
wherein providing the response to the request for data comprises providing the generated response to the first query (Kornacker [0046] discloses: results from the query executions engines 518a-c are passed to the query coordinator 516b via in memory transfers. If the query involves block operations (e.g., TopN, aggregation, etc.), intermediate results are streamed between the RT query engine demon nodes for pre-aggregation, and the final result is aggregated at the query coordinator 516b). 
Regarding claim 11, Kornacker as modified discloses:  The method of claim 1, wherein the cluster of data nodes is a cluster of data nodes for a distributed file system, each of the data nodes having an execution engine configured to perform data filtering and data aggregation (Kornacker [0040] discloses: intermediate results are streamed between the query executors and pre-aggregated at one or more the nodes at block 418. At block 420, the query coordinator performs an aggregation or merge of the pre-aggregated results to determine the final result); and wherein the method includes streaming processed data from the data nodes to an in-memory layer (Kornacker [0046] discloses: the query coordinator 516b via in memory transfers. If the query involves block operations (e.g., TopN, aggregation, etc.), intermediate results are streamed between the RT query engine demon nodes for pre-aggregation, and the final result is aggregated at the query coordinator 516b. Keeping query results or intermediate results in memory provides performance improvement).  
Regarding claim 12, Kornacker as modified discloses:  The method of claim 11, wherein the in-memory layer comprises multiple processing nodes; and wherein the method includes receiving the streamed data from the data nodes by the processing nodes of the in-memory layer (Kornacker [0046] discloses: the query coordinator 516b via in memory transfers. If the query involves block operations (e.g., TopN, aggregation, etc.), intermediate results are streamed between the RT query engine demon nodes for pre-aggregation, and the final result is aggregated at the query coordinator 516b. Keeping query results or intermediate results in memory provides performance improvement).  
Regarding claim 13, Kornacker as modified discloses:  A system comprising: one or more computers; and one or more computer-readable media storing instructions that, when executed by the one or more computers, cause the system to perform operations comprising:
 receiving a request for data (Kornacker [0025] discloses: received query from the client); 
in response to the request, providing instructions for processing data stored by a data source having an associated cluster of data nodes configured to retrieve data of the data source (Kornacker [0059; 0060] discloses: a computer system within which a set of instruction …; [0038] discloses: the query coordinator determines which nodes in the cluster should receive the query plan fragments for execution, the query coordinator distributes the query plan fragments to the nodes having relevant data to initiate execution of the plan fragments against the data local to each node);
 
processing the data of the data source according to the instructions, wherein the data nodes perform the processing in parallel on different portions of the data of the data source (Kornacker [0045] discloses: the query coordinator 516b hands off the tasks or plan fragments from the query planner 514b to the query execution engines 518a-c of each of the nodes that hold data relevant to the query request. All three query execution engines run in parallel and distributed fashion);
providing a response to the request for data based on the processed data in the in-memory cache (Kornacker [0046] discloses: results from the query executions engines are passed to the query coordinator 516b via in memory transfers…, and the final result is aggregated at the query coordinator 516b. Keeping query results or intermediate results in memory provides performance improvement). 
Kornacker didn’t disclose, but Eshleman discloses:  loading the processed data from the data nodes into an in-memory cache (Eshleman [0078; 0100] discloses: load reporting data in-memory based upon descriptions of the raw data and report specifications provided by users).


Regarding claim 14, Kornacker as modified discloses:  The system of claim 13, wherein the data source is a Hadoop distributed file system (HDFS) or a column-oriented database management system for the HDFS (Kornacker [0024] discloses: HDFS). 
Regarding claim 15, Kornacker as modified discloses: The system of claim 13, wherein the data source that provides SQL interaction for Hadoop data storage (Kornacker [0024] discloses: SQL application such as Hue, provide a user interface for Hadoop to run queries).
Regarding claim 16, Kornacker as modified discloses:  The system of claim 13, wherein the data source is a web service, a search server, a relational database management system (RDBMS), a streaming source, or a NoSQL database (Kornacker [0024] discloses: web application, query engine, a Hadoop cluster; [0054] discloses: RDBMS ).
Regarding claim 17, Kornacker as modified discloses: The system of claim 13, wherein the instructions comprise instructions for the data nodes associated with the data source to perform operations on the data of the data source, wherein the operations comprise one or more of data filtering, data aggregation, data wrangling, searching, data mining, text analytics, on-demand loading, incremental refreshing, data streaming, data blending, an extract-transform-load (ETL) workflow, or multi-sourcing (Kornacker [0046] discloses: queries operations such as TopN, aggregation, data streamed between the query engine nodes for preaggregation; [0054] discloses: query parsing, load the new data, rad, or recognize.; Eshleman [0079] discloses: ETL process and a data wrangling process.).  
Regarding claim 18, Kornacker as modified discloses:  The system of claim 13, wherein the instructions instruct the data nodes to perform one or more data analytics operations on data of the data source in addition to retrieving information from the data source (Eshleman [0090; 0095] discloses: the process for retrieving raw data); 
 wherein processing the data of the data source according to the instructions comprises: performing, by each processing node of the data nodes, the one or more data analytics operations on the portion of the data of the data source that corresponds to the processing node (Eshleman [0070] discloses: distributed processing of data sets across clusters of nodes. Hadoop is designed to scale from one to thousands of nodes, where each node is a computer responsible for its own processing and storage of data…, the distributed computing platform 110 is configured to run jobs generated by the interest-driven BI system utilizing Hadoop MapReduce and queries utilizing Hive);; and 
providing, by the data nodes, results of performing the one or more data analytics operations on the respective portions of the data of the data source (Eshleman [0081] discloses: A process 429 is applied to the aggregate data 424 to populate the schema 428 to provide reporting data 430 that can be loaded in-memory and used in the interactive generation of reports by users to facilitate the visualization and exploration of the data). 
Regarding claim 19, Kornacker as modified discloses: The system of claim 13, comprising: identifying processing to perform for data of the data source; and determining (i) a first portion of the processing to be performed by the data nodes associated with the data source (Kornacker [0045] discloses the query coordinator 516b hands off the tasks or plan fragments from the query planner 514b to the query execution engines 518a-c of each of the nodes that hold data relevant to the query request. All three query execution engines run in parallel and distributed fashion) and (ii) a second portion of the processing to be performed by a data analytics engine (Kornacker [0044] discloses: analyze the query request to determine tasks that can be distributed across the low latency (LL) query engine daemons in the cluster); wherein the instructions instruct the data nodes to perform the first portion of the processing (Kornacker [0059] discloses: a set of instructions for perform tasks); 
wherein the method comprises: receiving, from the data nodes, data including results of the first portion of the processing (Kornacker [0038] discloses: query coordinator receives or obtains membership information from the state store and location information from the name node (for HDFS query) at block 408. Using the membership information and the block location information, the query coordinator determines which daemons or nodes in the cluster should receive the query plan fragments for execution); and
 performing the second portion of the processing using the data analytics engine (Kornacker [0044] discloses: analyze the query request to determine tasks that can be distributed across the low latency (LL) query engine daemons in the cluster); 
wherein loading the processed data comprises loading data generated by performing the second processing using the data analytics engine (Eshleman [0078; 0100] discloses: load reporting data in-memory based upon descriptions of the raw data and report specifications provided by users). 
Regarding claim 20, Kornacker as modified discloses:  One or more non-transitory computer-readable media storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising: 
receiving, by the one or more computers, a request for data (Kornacker [0025] discloses: received query from the client); 
 in response to the request, providing, by the one or more computers, instructions for processing data stored by a data source having an associated cluster of data nodes configured to retrieve data of the data source (Kornacker [0059; 0060] discloses: a computer system within which a set of instruction …; [0038] discloses: the query coordinator determines which nodes in the cluster should receive the query plan fragments for execution, the query coordinator distributes the query plan fragments to the nodes having relevant data to initiate execution of the plan fragments against the data local to each node);

(Kornacker [0045] discloses: the query coordinator 516b hands off the tasks or plan fragments from the query planner 514b to the query execution engines 518a-c of each of the nodes that hold data relevant to the query request. All three query execution engines run in parallel and distributed fashion);
providing, by the one or more computers, a response to the request for data based on the processed data in the in-memory cache (Kornacker [0046] discloses: results from the query executions engines are passed to the query coordinator 516b via in memory transfers…, and the final result is aggregated at the query coordinator 516b. Keeping query results or intermediate results in memory provides performance improvement)..
Kornacker didn’t disclose, but Eshleman discloses:  loading, by the one or more computers, the processed data from the data nodes into an in-memory cache (Eshleman [0078; 0100] discloses: load reporting data in-memory based upon descriptions of the raw data and report specifications provided by users).
Kornacker and Eshleman are analogous art because they are) in the same field of endeavor, aggregation processes. It would have been obvious to one of ordinary skill in the art, at the time of filling, to modify Kornacker, to include the teaching of Eshleman, in order to providing aggregation process for distributed computing system. The suggestion/motivation to combine is for dynamically build the stages of a multi-stage data pipeline to load data of interest into system memory based on the desired end-consumption of the data and to provide access to desired information (Eshleman [0012; 0013]).


Contact Information




Any inquiry concerning this communication or earlier communications from the examiner should be directed to CINDY NGUYEN whose telephone number is (571)272-4025. The examiner can normally be reached M-F 8:00-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Apu Mofiz can be reached on 571-272-4080. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CINDY NGUYEN/Examiner, Art Unit 2161