EXAMINER’S AMENDMENT

Notice of Pre-AIA  or AIA  Status
The present application is being examined under the pre-AIA  first to invent provisions. 

An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.

Verbal authorization for this examiner’s amendment was given via telephone conversation with attorney of record Harrison H. Schecter on 8/10/2022.


The application has been amended as follows: 

1. (Currently amended) A computer implemented method for execution of aggregation expressions on a distributed database system, the method comprising the acts of:
determining, by a computer system, an optimized plan for execution of an aggregation operation on data stored in a distributed database under an at least partially unstructured architecture, wherein the aggregation operation includes a plurality of data operations targeting the data stored under the at least partially unstructured architecture of the distributed database, 
wherein the data stored under the at least partially unstructured architecture of the distributed database includes at least one first collection of documents,  the documents of the at least one first collection of documents store data based on attribute- value pairs comprising key-value pairs,
and at least one second collection of documents,  the documents of the at least one second collection of documents store data based on attribute- value pairs comprising key-value pairs,
each of the at least one first and at least one second collections permitting storage therein of documents having different schemas specified by respective attribute-value pairs;
modifying, by the computer system, the plurality of data operations to optimize execution;
splitting the aggregation operation into a distributed aggregation operation and a merged aggregation operation;
executing data field dependency analysis on the distributed database to identify a plurality of distributed database nodes of the distributed database having the data targeted by the plurality of data operations, wherein the at least partially unstructured architecture of the distributed database enables storage, within at least a singular grouping of documents in the plurality of distributed database nodes, of a plurality of documents supporting values for at least one different data field with respect to one another, and wherein the act of executing the data field dependency analysis includes determining whether results of the aggregation operation are independent of at least one data field supported by at least one of the plurality of documents and, in response to determining that the results of the aggregation operation are independent of the at least one data field, identifying the at least one data field to be eliminated from the execution of the plurality of data operations;
instructing each of the plurality of distributed database nodes to perform the distributed aggregation operation;
aggregating, at a merging server, the results of the distributed aggregation operation from each of the plurality of distributed database nodes, wherein the results are stored under the at least partially unstructured architecture of the distributed database;
performing the merged aggregation operation on the aggregated results of the distributed aggregation operation from each of the plurality of distributed database nodes hosting the data stored under the at least partially unstructured architecture; and
generating a result of the merged aggregation operation, the results generated under the at least partially unstructured architecture of the distributed database.

2. (Original) The method according to claim 1, wherein splitting the aggregation operation into a distributed aggregation operation and a merged aggregation operation, includes identifying operations for execution on database shards or respective database nodes, and identifying operations that rely on merging data output from other operations.

3. (Original) The method according to claim 1, wherein the aggregation operation includes a sequence of execution for the plurality of data operations, and the act of determining includes identifying a sequence of execution wherein execution of an operation in the sequence permits optimization of a preceding operation or a subsequent operation.

4. (Canceled)

5. (Previously presented) The method according to claim 1, wherein the act of modifying includes analyzing dependencies defined in the at least one execution stage, and modifying a sequence of execution of the operations within the at least one execution stage.

6. (Original) The method according to claim 5, wherein the act of modifying includes recursively analyzing the sequence of execution for further optimization responsive to changes in determining the optimized plan.
7. (Previously presented) The method according to claim 1, wherein the operation in the sequence is a merge operation.

8. (Previously Presented) The method according to claim 7, wherein a preceding operation can be performed on a set of data stored on one of the plurality of database nodes.

9. (Previously presented) The method according to claim 1, wherein determining the optimized plan includes analyzing a query predicate to identify a subset of data to be processed by operations within an aggregation wrapper.

10. (Original) The method according to claim 1, further comprising the act of designating the merging shard server from among the plurality of database nodes according to a performance metric of the merging shard server.

11. (Original) The method according to claim 10, wherein the performance metric comprises a number of aggregation operations being performed on the merging shard server.

12. (Original) The method of claim 1, wherein the distributed aggregation operations is executed across a plurality of nodes in parallel.


13. (Currently amended) A distributed database system for execution of aggregation expressions on a distributed database system, the system comprising: 
at least one processor operatively connected to a memory; 
a plurality of distributed database nodes configured to perform a distributed aggregation operation; 
a router component, executed by the at least one processor, configured to instruct each of the plurality of distributed database nodes to perform the distributed aggregation operation; 
and an aggregation engine, executed by the at least one processor, configured to: 
determine an optimized plan for execution of an aggregation operation on data stored in the distributed database under an at least partially unstructured architecture, wherein the distributed aggregation operation includes a plurality of data operations targeting the data stored under the at least partially unstructured architecture of the distributed database, 
wherein the data stored under the at least partially unstructured architecture of the distributed database includes at least one first collection of documents, the documents of the at least one first collection of documents store data based on attribute- value pairs comprising key-value pairs, 
and at least one second collection of documents, the documents of the at least one second collection of documents store data based on attribute- value pairs comprising key-value pairs, 
each of the at least one first and at least one second collections permitting storage therein of documents having different schemas specified by respective attribute-value pairs; 
modify the plurality of data operations to optimize execution; 
split the aggregation operation into the distributed aggregation operation and a merged aggregation operation, based at least in part on data field dependency analysis on the distributed database to identify ones of the plurality of distributed database nodes having the data targeted by the plurality of data operations, wherein the at least partially unstructured architecture of the distributed database enables storage, within at least a singular grouping of documents in the plurality of distributed database nodes, of a plurality of documents supporting values for at least one different data field with respect to one another, and wherein the data field dependency analysis comprises determining whether results of the aggregation operation are independent of at least one data field supported by at least one of the plurality of documents and, in response to determining that the results of the aggregation operation are independent of the at least one data field, identifying the at least one data field to be eliminated from the execution of the plurality of data operations;
aggregate, at a merging shard server, the results of the distributed aggregation operation from each of the plurality of distributed database, wherein the results are stored under the at least partially unstructured architecture of the distributed database; 
and perform the merged aggregation operation on the aggregated results; 
and generate a result of the merged aggregation operation stored under the at least partially unstructured architecture of the distributed database.

14. (Original) The system according to claim 13, wherein the aggregation engine is configured to: identify operations for execution on database shards or respective database nodes; and identify operations that rely on merging data output from other operations, as part of splitting the aggregation operation.

15. (Original) The system according to claim 13, wherein the aggregation operation includes a sequence of execution for the plurality of data operations, and the aggregation engine is configured to identify a sequence of execution wherein execution of an operation in the sequence permits optimization of a preceding operation or a subsequent operation.

16. (Original) The system according to claim 13, wherein the aggregation engine is configured to determine the optimized plan based on at least one of: reordering operations or execution stages, merging operations or execution stages, and eliminating unnecessary operations or stages.

17. (Canceled)

18. (Original) The system according to claim 13, where the aggregation engine is configured to recursively analyze the sequence of execution for further optimization responsive to changes in determining the optimized plan.

19. (Original) The system according to claim 16, wherein the operation in the sequence is a merge operation.

20. (Original) The system according to claim 13, wherein the aggregation engine is configured to manage execution of the distributed aggregation operation across a plurality of nodes in parallel.

21. (Previously presented) The method according to claim 1, wherein identifying the at least one data field to be eliminated from the execution of the plurality of data operations comprises determining that the at least one data field does not need to be accessed to perform the plurality of data operations.

22. (Previously presented) The system according to claim 13, wherein the aggregation engine is configured to identify the at least one data field to be eliminated from the execution of the plurality of data operations by determining that the at least one data field does not need to be accessed to perform the plurality of data operations.

23. (Previously presented) The method according to claim 1, wherein identifying the at least one data field to be eliminated from the execution of the plurality of data operations further comprises passing data from at least one prior operation of the plurality of data operations to at least one subsequent operation of the plurality of data operations.

24. (Previously presented) The system according to claim 13, wherein the aggregation engine is further configured to pass data from at least one prior operation of the plurality of data operations to at least one subsequent operation of the plurality of data operations.

25. (Currently amended) The method according to claim 1, wherein the at least one first collection of documents and the at least one second collection of documents each permit[[ting]] storage of at least BSON data structures having different schemas.

26. (Currently amended) The system according to claim 13, wherein the 



27. (New) The method according to claim 1, wherein: 


each of the at least one first and at least one second collections permit storage therein of documents having different schemas specified by respective attribute-value pairs comprising key- value pairs.

28. (New) The system according to claim 13, wherein:


each of the at least one first and at least one second collections permit storage therein of documents having different schemas specified by respective attribute-value pairs comprising key- value pairs.






Reasons for Allowance
The following is an examiner’s statement of reasons for allowance:
The most relevant prior art has been cited in included form PTO-892 Notice of References Cited:
Regarding independent claim 1,
Bakalash et al. (US PGPUB No. 2002/0029207), Harvey et al. (US PGPUB No. 2010/0161492), Batra et al. (US PGPUB No. 2012/0054249) and Krebs (US PGPUB No. 2005/0216923).
Bakalash is directed to a query interface configured to optimized received SQL statements using both relational and non-relational architectures in order to process non-relational queries as efficiently as relational queries.
Harvey discloses a process of improving performance of aggregation query processing by dividing incoming queries into small portions and resolving the query portions in parallel in order to merge aggregation operation results.
Batra discloses a system of aggregating attributes of a data warehouse model using Dependency Analysis Graph techniques which includes steps of scanning fact tables of a data warehouse to identify a set of attributes on which aggregation operations are defined.
Krebs is directed to a system for improving retrieval of data via dependency-based optimizations which include removing superfluous dependencies that are not required to process incoming queries.
The references make clear that methods and systems for processing relational and non-relational data are known in the art. Additionally, dependency analysis within the field of aggregation operations are known in the art. However, the references, alone or in combination do not disclose the claimed invention as currently amended and are therefore insufficient to establish a case of obviousness. Applicant’s amendments in addition to the Examiner’s amendments represent additional features that are not present in the aforementioned most pertinent prior art.
	
Regarding claims 2-12, 21, 23, 25 and 27,
	Dependent claims 2-12, 21, 23, 25 and 27depend upon independent claim 1 and are therefore considered to contain allowable subject matter as discussed above.

Regarding independent claim 13,
	Independent claim 13 is analogous to the subject matter of independent claim 1 directed to a computer system and therefore contains allowable subject matter as discussed above.

Regarding dependent claims 14-16, 18-22, 24, 26 and 28,
	Dependent claims 14-16, 18-22, 24, 26 and 28 depend upon independent claim 13 and are therefore considered to contain allowable subject matter as discussed above.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Fernando M Mari whose telephone number is (571)272-2498. The examiner can normally be reached Monday-Friday 6am-3pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela Reyes can be reached on (571) 270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/FMMV/Examiner, Art Unit 2159                      
/Mariela Reyes/Supervisory Patent Examiner, Art Unit 2159