Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

With respect the claims 1-24, the claims 1-24 recite a series for optimized data structure.  Thus the claims are directed to a statutory category, because a series of optimized data structure (a series of acts).  Further, the claim is directed to a judicial exception.  The claims recite automatically learn relationship among plurality of dataset, generate an optimized data structure, store the optimized data structure, and modify the query plan.  The claims fall in one of abstract ideas, “Mental Process”.  One of The Abstract Ideas categories is “Mental Process " such as concepts performed in the human mind.   An idea standing alone such an unistantiated concept, plan, or scheme, as well as a mental process (thinking) that “can be performed in human mind, or by a human using pen and paper.  Like the invention in Alice Corp, the instant claim is merely limiting the abstract idea to a computer environment by simply performing the idea via a computer to optimize data structure.  This is abstract idea.  Further, at step 2B, the claims does not have any additional limitations recited that amount to significantly more than the abstract idea.  The claims require no additional limitations.  These generic computer components (processor, memory, etc) are claimed to perform their basic functions optimized data structure. This recitation of the computer limitations amounts to mere instructions to implement the abstract idea on a computer.   Taking the additional elements individually and in combination, the computer components at each step of optimized data structure perform purely generic computer functions.  As such, there is no inventive concept sufficient to transform the claimed subject matter into a patent-eligible application.  The claim does not amount to significantly more than the abstract idea itself.  Accordingly, the claim is not patent eligible.	
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 3-4, 6 and 24 is/are rejected under 35 U.S.C. 102(a) (1) as being anticipated by Brown et al. (U.S. Pub. 2005/0097072 A1).
With respect to claims 1, and 24, Brown et al. discloses a method performed by a data system, the method comprising: 
automatically learning one or more relationships among a plurality of datasets based on one or more of a user query or an observation of a data flow of the data system (i.e., “The present invention provides for a method that automatically discovers algebraic relationships between attributes and provides this information to, for example, an optimizer in the form of constraint predicates, along with an estimate of the predicates' selectivity…the present invention's system and method are extended to discover other relationships, such as fuzzy functional dependencies” (0030) or “Previous work on automatic methods for learning about data relationships can be categorized according to whether the learning technique is query- or data-driven, and according to the type of information discovered”(0005) or “BHUNT may segment the values in W.sub.C to identify "natural" clusters of the points, using any of the many well known clustering techniques available (see book by Hastie et al. entitled, "The elements of statistical learning: data mining, Inference, and Prediction").”(0117)); 
generating an optimized data structure based on the learned relationships among the plurality of datasets (i.e., “For each candidate, the scheme constructs algebraic constraints by applying statistical histogramming, segmentation, or clustering techniques to samples of column values. In query-optimization mode, the scheme automatically partitions the data into normal and exception records” (abstract) and “The present invention provides for a method that automatically discovers algebraic relationships between attributes and provides this information to, for example, an optimizer in the form of constraint predicates, along with an estimate of the predicates' selectivity” (0030) and provides this information is the generating an optimized data structured as claimed invention); and 
modifying a query plan to obtain query results that satisfy a query by reading the optimized data structure in lieu of reading the plurality of datasets (i.e., “(d) during query processing, modifying queries to incorporate the identified algebraic constraints with an optimizer utilizing the identified algebraic constraints to identify new, more efficient access paths”(0029) and “knowledge of the discovered predicates provides new access plans for the optimizer's consideration, wherein the new access paths lead to substantial speedups in query processing. Such predicates also allow the database administrator (DBA) to consider alternative physical organizations of the data, such as the creation of materialized views and/or indexes, or the use of alternative partitioning strategies”(0030) and knowledge of discovered predicates is learning as claimed invention) .  
With respect to claim 3, Brown et al. discloses wherein the learned relationships include a runtime statistic based on the data flow of the data system such that the optimized data structure reflects the runtime statistic (i.e., “For each candidate, the scheme constructs algebraic constraints by applying statistical histogramming, segmentation, or clustering techniques to samples of column values”(abstract)).  

With respect to claim 4, Brown et al. discloses wherein the learned relationships include a workload distribution based on the data flow of the data system such that the optimized data structure reflects the workload distribution (i.e., “Many of these algorithms rely on information contained in the schema definition--such as primary-key declarations--or in a set of workload queries”(0011)).
With respect to claim 6, Brown et al. discloses wherein the optimized data structure includes a plurality of operators including a join of the plurality of datasets (i.e., “Knowledge of this predicate can help an optimizer choose an efficient method for joining the orders and deliveries tables” (0046)).  
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2, 5, 7-8, 11-14, 16, 21-22 are rejected under 35 U.S.C 103(a) as being unpatentable over Brown et al. (U.S. Pub. 2005/0097072 A1) in view of Virkar et al. (U.S. Pub. 2010/0063948 A1)
With respect to claim 2, Brown et al. discloses all limitations recited in claim 1 except for wherein the learned relationships among the plurality of datasets is based on a pattern of user queries received by the data system such that the optimized data structure reflects the pattern of user queries.  However, Virkar et al. discloses wherein the learned relationships among the plurality of datasets is based on a pattern of user i.e., “machine learning methods are provided that include providing one or more data patterns; providing one or more data samples; training two or more learning machines to identify which of the one or more data samples correspond to the one or more data patterns; and selecting the trained learning machine that identifies which of the one or more data samples correspond to the one or more data patterns by optimizing a performance function dependent on one or more variables selected from the group consisting of maximizing divergence between the classes of data, n-fold cross validation…the selected trained learning machine may be used to conduct a query of data contained in a database to identify data corresponding to said one or more data patterns, followed by outputting data identified by the query as corresponding to said one or more data patterns on an output device”(0033) or “Some existing machine learning techniques utilize SVMs that allow pattern-matching within a single type of experiment, and SVMs have proven superior to many supervised and unsupervised methods at classification based on subtle relationships in data”(0011)). It would have been obvious for a person of ordinary skill in the art, as of the effective filing date of the claimed invention, to include Virkar et al.’s features in order to have accurate methods for estimating the success of a trained machine using training data alone for the stated purpose has been well known in the art as evidenced by teaching of Virkar et al. (0029).
	With respect to claim 5, Virkar et al. discloses wherein the optimized data structure includes an aggregation of the plurality of datasets (i.e., “A variable used individually or in combination with one or more additional variables to identify a pattern or class to which data correspond. The variable is further defined by a value. For example, "gender" may be a feature in a data set, which may have a possible discrete value of "male," "female," or "unknown." Other features may have continuous values, such as "intensity" which may have a numeric value” (0071)) (same motivation as claim 2 above).  
7, Virkar et al. discloses wherein the optimized data structure includes a join of the plurality of datasets (i.e., “A variable used individually or in combination with one or more additional variables to identify a pattern or class to which data correspond. The variable is further defined by a value. For example, "gender" may be a feature in a data set, which may have a possible discrete value of "male," "female," or "unknown." Other features may have continuous values, such as "intensity" which may have a numeric value” (0071)) (same motivation as claim 2 above).  
  With respect to claim 8, Virkar et al. discloses the method of claim 7 wherein the join preserves a quantity of records equal to a quantity of records of one of the plurality of datasets (i.e., “Examination of the trained machines to select the best trained machine 150 and the best training data 160 is accomplished by optimizing a performance function that is dependent on a variable, which may be accomplished by using any of numerous methods or combinations of methods, such as cross-validation or maximal separation of the training data categories (divergence), but not by the use of a test set”(0091) and fig. 1).  
With respect to claim 11, Brown et al. discloses the method of claim 7, wherein the query is a first query, the query plan is a first query plan, and the query results are first query results, the method further comprising: modifying a second query plan to obtain second query results that satisfy a second query by reading the optimized data structure in lieu of reading the plurality of datasets, wherein the first query plan has a different combination of datasets compared to the second query plan (i.e., “During query processing, modify the queries to incorporate the constraints--the optimizer uses the constraints to identify new, more efficient access paths. Then combine the results with the results of executing the original query against the (small) exception table--step 208” (0029)).  
i.e., “FIG. 4 is an exemplary flow chart depicting a machine learning method for identifying patterns contained in a database, based on sample information provided by a user, in accordance with aspects of the present invention. The fully automated query-by-example for databases searches directly on data in this exemplary implantation, not just based on documentation or annotations, identifies matches and ranks hits, and reports and ranks the most important results”(0045)). (Same motivation above)
With respect to claim 13, Virkar et al. discloses the method of claim 7, wherein the query refers to at least one of the plurality of datasets that are contained in a plurality of data sources, the method further comprising: obtaining the query results without reading any of the plurality of datasets from the plurality of data sources (i.e., “Training data may be derived from known training data sets used in the machine learning field, or may be created for a specific machine learning application by culling data from appropriate databases or other sources. This training data will also be referred to interchangeably herein as a master training data set. Training data may also be selected from repositories of data that include information about classes of data, such as GEO, from research, from clinical trials, patient health records, and other biomedical and non-biomedical sources” (0104) and Examiner asserts training data are refereed to interchangeable herein as a master training data set so when the query result, the result is obtained from the master training data and not from the plurality of dataset from the plurality of data source)
i.e., “During query processing, each query is modified, if possible, to incorporate the discovered constraints. The modified query is run against the original data, the original query is run against the data in the exception table, and the two sets of results are combined. It should be noted here that the algorithm builds on standard query processing technology” (0144), Examiner assert the original data is the plurality of data sources and exception table is the optimized data structure as claimed invention).  
With respect to claim 16, Brown et al. discloses the method of claim 7, wherein learning the one or more relationships and generating the optimized data structure occurs at runtime of one or more query executions (i.e., “during query processing, modifying queries to incorporate the identified algebraic constraints with an optimizer utilizing the identified algebraic constraints to identify new, more efficient access paths”(0029)).  
With respect to claim 21, Brown et al. discloses the method of claim 7, further comprising, prior to modifying the query plan: autonomously deciding to generate the optimized data structure ((i.e., “The present invention provides for a method that automatically discovers algebraic relationships between attributes and provides this information to, for example, an optimizer in the form of constraint predicates, along with an estimate of the predicates' selectivity…the present invention's system and method are extended to discover other relationships, such as fuzzy functional dependencies” (0030) or “Previous work on automatic methods for learning about data relationships can be categorized according to whether the learning technique is query- or data-driven, and according to the type of information discovered”(0005)).  
With respect to claim 22, Brown et al. discloses the method of claim 21, wherein the decision to generate the optimized data structure is based on a determination that reading the optimized data structure in lieu of reading at least some dataset from a plurality of data sources improves processing of an expected workload (i.e., “an optimizer is able to utilize this information to improve cost estimates. Also, knowledge of the discovered predicates provides new access plans for the optimizer's consideration, wherein the new access paths lead to substantial speedups in query processing”(0030)).  
Claim 9 is rejected under 35 U.S.C 103(a) as being unpatentable over Brown et al. (U.S. Pub. 2005/0097072 A1), Virkar et al. (U.S. Pub. 2010/0063948 A1) and further in view of Binkert et al. (U.S. Pat. 10,769,148 B1).
 With respect to claim 9, Brown and Virkar et al. disclose all limitations recited in claim 7 except for wherein the plurality of datasets comprises a fact table and a plurality of dimension tables, and generating the optimized data structure comprises: preserving a quantity of all records of the fact table such that the join has a corresponding quantity of records despite any of the plurality of dimension tables having a quantity of records different from the quantity of records of the fact table.  However, Binkert et al. discloses wherein the plurality of datasets comprises a fact table and a plurality of dimension tables (i.e., “In query 810 both a remote table and local table may be joined 812 (e.g., the remote table may be a fact table and the local table may be a dimension table” (col. 17, lines 15-25)), and generating the optimized data structure comprises: preserving a quantity of all records of the fact table such that the join has a corresponding quantity of records despite any  (i.e., “Different locations of data may also benefit from relocating data sharing operations. In query 810 both a remote table and local table may be joined 812 (e.g., the remote table may be a fact table and the local table may be a dimension table” (col. 17, lines 15-25)).  It would have been obvious for a person of ordinary skill in the art, as of the effective filing date of the claimed invention, to include Binkert et al.’s features in order to easy to perform management for the stated purpose has been well known in the art as evidenced by teaching of Binkert et al. (col. 1, lines 10-20).
Claim 10 is rejected under 35 U.S.C 103(a) as being unpatentable over Brown et al. (U.S. Pub. 2005/0097072 A1), Virkar et al. (U.S. Pub. 2010/0063948 A1) and further in view of Narayanan et al. (U.S. Pub. 2019/0272773 A1).
 With respect to claim 10, Brown and Virkar et al. disclose all limitations recited in claim 7 except wherein automatically learning one or more relationships among the plurality of datasets comprises: autonomously determining a fact-dimension relationship among the plurality of datasets.  However, Narayanan et al. discloses wherein automatically learning one or more relationships among the plurality of datasets comprises: autonomously determining a fact-dimension relationship among the plurality of datasets (i.e., “the signal-derived representations is used as a foundation for machine learning, data mining, and statistical algorithms that can be used to determine what factors, or combinations of factors, predict a variety or relationship dimensions, such as conflict, relationship quality, or positive interactions”(0014), claim 12). It would have been obvious for a person of ordinary skill in the art, as of the effective filing date of the claimed invention, to include Narayanan et al.’s features in order to increase the accuracy of the 0004).
Claim 15 is rejected under 35 U.S.C 103(a) as being unpatentable over Brown et al. (U.S. Pub. 2005/0097072 A1), Virkar et al. (U.S. Pub. 2010/0063948 A1) and further in view of Al-Omari et al. (U.S. Pub. 2011/0029508 A1),
With respect to claim 15, Brown and virkar et al. disclose all limitations recited in claim 7 except for further comprising, prior to modifying the query plan: determining that the query can be accelerated based on the optimized data structure stored in a cache of the data system.  However, Al-Omari et al. discloses further comprising, prior to modifying the query plan: determining that the query can be accelerated based on the optimized data structure stored in a cache of the data system (i.e., “fig. 8 shows optimized plan in cache step 812, if the query is in optimized query-plan cache 814, the query can be accelerated direct to 822.  If the determining the query is not in caches, then modify the query plan as step 818. Or paragraph 0029).  It would have been obvious for a person of ordinary skill in the art, as of the effective filing date of the claimed invention, to include Al-Omari  et al.’s features in order to improve the optimized query plan catching, therefore achieve greater efficiencies in query processing for the stated purpose has been well known in the art as evidenced by teaching of Al-Omari et al. (0002).

Claim 17 is rejected under 35 U.S.C 103(a) as being unpatentable over Brown et al. (U.S. Pub. 2005/0097072 A1), Virkar et al. (U.S. Pub. 2010/0063948 A1) and further in view of Mathew (U.S. Pub. 2018/0314744 A1).
i.e., “a system 100 includes one or more host devices 106. Host devices 106 may broadly include any number of computers, virtual machine instances, or data centers that are configured to host or execute one or more instances of host applications 114” (0088) or “a query reference to a virtual index relates to an externally stored and managed data collection”(0323)).  It would have been obvious for a person of ordinary skill in the art, as of the effective filing date of the claimed invention, to include Mathew et al.’s features in order to identify quickly the record for the stated purpose has been well known in the art as evidenced by teaching of Mathew et al. (0004).
Claim 20 is rejected under 35 U.S.C 103(a) as being unpatentable over Brown et al. (U.S. Pub. 2005/0097072 A1), Virkar et al. (U.S. Pub. 2010/0063948 A1) and further in view of Haas et al. (U.S. Pat. 6,934,699 B1)
With respect to claim 20, Brown and virkar et al. disclose all limitations recited in claim 7 except further comprising, prior to modifying the query plan: storing the optimized data structure in a cache memory that stores a plurality of optimized data structures each including a record-preserving join.  However, Haas et al. discloses further comprising, prior to modifying the query plan: storing the optimized data structure in a cache memory that stores a plurality of optimized data structures each including a record-preserving join (fig. 5 shows define effect of cache operator (72) before modifying the query plan such as block 73 and 76 mentioned first and second phases are respectively modified such that some alternate plans are generated at each phase with cache operator or col. 7, lines 10-20)).  It would have been obvious for a person of ordinary skill in the art, as of the effective filing date of the claimed invention, to include Haas et al.’s features in order to quickly access the record for the stated purpose has been well known in the art as evidenced by teaching of Haas et al. (col. 1, lines 58-67).
Claim 23 is rejected under 35 U.S.C 103(a) as being unpatentable over Brown et al. (U.S. Pub. 2005/0097072 A1), Virkar et al. (U.S. Pub. 2010/0063948 A1) and further in view of Chen et al. (U.S. Pub. 2011/0196857 A1)
With respect to claim 23, Brown and virkar et al. disclose all limitations recited in claim 7 except wherein automatically learning one or more relationships among the plurality of datasets comprises: identifying the join formed of a fact table and at least one-dimension table stored at one or more data sources; and determining that the join is a record-preserving join when it has only one record for each record of the fact table.  However, Chen et al. discloses wherein automatically learning one or more relationships among the plurality of datasets comprises: identifying the join formed of a fact table and at least one-dimension table stored at one or more data sources (i.e., “another way of identifying a star join and fact table can include the following flow…this table can be identified as fact table. Additionally, there might be a dimension table that is joined equally” (0067)); and determining that the join is a record-preserving join when it has only one record for each record of the fact table sources (i.e., “another way of identifying a star join and fact table can include the following flow…this table can be identified as fact table. Additionally, there might be a dimension table that is joined equally” (0067, 0086)).  It would have been obvious for a person of ordinary skill in the art, as of the effective filing date of the claimed invention, to include Chen et al.’s features in order to improve perform of complex queries for the stated purpose has been well known in the art as evidenced by teaching of Chen et al. (0002).
Allowable Subject Matter
Claims 18-19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, since the prior art of record and considered pertinent to the applicant’s disclosure does not teach or suggest the claimed wherein modifying the query plan comprises: processing each of a plurality of optimized data structures including a respective record-preserving join by: skipping to a next optimized data structure if the record-preserving join does not have a physical dataset in common with the query plan; and pruning a physical dataset of the record-preserving join when the physical dataset is not referred to in the query plan; and   determining whether the query plan is modifiable to utilize the pruned record- preserving join and whether utilizing the pruned record-preserving join would accelerate query execution relative to utilizing an unmodified query plan and all datasets of each record-preserving join that is not referred to in the query plan is pruned from each record-preserving join.  

Claim 27 is rejected under 35 U.S.C 103(a) as being unpatentable over Brown et al. (U.S. Pub. 2005/0097072 A1) in view of Al-Omari et al. (U.S. Pub. 2011/0029508 A1).
With respect to claim 27, Brown discloses discloses a data system comprising: 
a processor configured to modify a query plan to obtain query results by reading at least some of the plurality of optimized data structures in lieu of reading data stored in a plurality of data sources (i.e., “The present invention provides for a method that automatically discovers algebraic relationships between attributes and provides this information to, for example, an optimizer in the form of constraint predicates, along with an estimate of the predicates' selectivity… knowledge of the discovered predicates provides new access plans for the optimizer's consideration, wherein the new access paths lead to substantial speedups in query processing. Such predicates also allow the database administrator (DBA) to consider alternative physical organizations of the data, such as the creation of materialized views and/or indexes, or the use of alternative partitioning strategies” (0030) or “the present invention provides for a new data-driven mining technique for discovering fuzzy hidden relationships among the data in a RDBMS. BHUNT provides the discovered relationships in the form of constraint predicates that can be directly used by a query optimizer”(0153) and RDBMS is the relational Database Management system and it is for relational databases and it is plurality of data sources as claimed invention ).  But Brown does not explicitly disclose a cache memory configured to store a plurality of optimized data structures that each include a record-preserving join.  However, Al-Omari et al. discloses a cache memory configured to store a plurality of optimized data structures that each include a record-preserving join (i.e., “In order to provide a better optimized-query-plan caching mechanism, in certain embodiments of the present invention, the optimizer produces optimized but not fully specified, query plans, such as query plans 430 and 432, for storage in the cache, kin mg specification of the join operation as a post-optimization, or second-level, optimization task that can be carried out after an optimized query plan is retrieved from the optimized query-plan cache”(0031)); ).  It would have been obvious for a person of ordinary skill in the art, as of the effective filing date of the claimed invention, to include Al-Omari  et al.’s features in order to improve the optimized query plan catching, therefore achieve greater efficiencies in query processing for the stated purpose has been well known in the art as evidenced by teaching of Al-Omari et al. (0002).
Claim 28 is rejected under 35 U.S.C 103(a) as being unpatentable over Brown et al. (U.S. Pub. 2005/0097072 A1), Al-Omari et al. (U.S. Pub. 2011/0029508 A1) and further in view of Lee et al. (U.S. Pub. 20174/0147644 A1).
With respect to claim 28, Brown and Al-Omari et al. disclose all limitations recited in claim 27 except for wherein the processor is further configured to prune any dataset from each record-preserving join that is not referred to in the query plan.  However, Lee et al. discloses wherein the processor is further configured to prune any dataset from each record-preserving join that is not referred to in the query plan (i.e., “node 216-1 may not need to be joined to node 214-1 to produce a result of query 202 and optimizer 116 may prune both the join operation node, node 212-1, and the right child node, node 216-1.”(0042)). It would have been obvious for a person of ordinary skill in the art, as of the effective filing date of the claimed invention, to include Lee et al.’s features in order to increase query processing time and complexity for the stated purpose has been well known in the art as evidenced by teaching of Lee et al. (0001).


Reasons for Allowance
11.	Claims 25-26 are allowed.  
The following is an examiner’s statement of reason for allowance:  
None of the references of record teaches or suggests the claimed data system comprising: a processor; and   memory containing instructions that, when executed by the processor, cause the data system to: automatically learn one or more relationships among a plurality of datasets based on one or more of a user query or an observation of a data flow of the data system, the plurality of datasets being stored at one or more data sources; generate an optimized data structure based on the learned relationships among the plurality of datasets; store the optimized data structure in a cache storing a plurality of optimized data structures; and modify a query plan to obtain query results that satisfy a query by reading the optimized data structure in lieu of reading the plurality of datasets stored at the one or more data sources.  
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUNG T VY whose telephone number is (571)272-1954.  The examiner can normally be reached on M-F 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tony Mahmoudi can be reached on (571)272-4078.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/HUNG T VY/Primary Examiner, Art Unit 2163                                                                                                                                                                                                        February 11, 2021