DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This is Non-Final Office Action in response to application filed on March 30, 2017 in which claims 1-20 are presented for examination.
Information Disclosure Statement
The references listed in the IDS filed on March 30, 2017 has been considered and entered into record. A copy of the signed or initialed IDS is hereby attached.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent No. 10/572,479. Although the claims at issue are not identical, they are not patentably distinct from each other because they are directed toward the same subject matter.
All limitations and elements in claim 1 of the instant application are found in claim 1 of Viswanadha except “receive a request at a database system” have been omitted. Given the fact that the ‘020 invention has broader applications where “the request to distributely join a first data set and a second dataset” must be occurred at the database. Although the claims at issue are not identical, they are not patentably distinct from each other because they are substantially similar in scope and they use the similar limitations as showed in the Claims Comparison Table below.  
Claims Comparison Table:
Instant application  #16/780,020
Patent # 10/552,415
Claim 1. A computer system comprising: 

one or more processors; and one or more computer-readable storage media having stored thereon computer-executable instructions that are executable by the one or more processors to cause the computer system to distributedly join two data sets, the computer-executable instructions including instructions that are executable to cause the computer system to perform at least the following: 

receive a request at a database system to distributedly join a first data set and a second data set on a first key, wherein the first data set is shuffled on the first key and the second data set is shuffled on a second key, the second data set comprising a data set that is larger than the first data set; 

generate and store a third data set that is both shuffled on the first key and includes data associated with the second key; distributedly join the first data set and the third data set on the first shuffle key to generate a fourth data set that is shuffled on the first key and includes data associated with both the first key and the second key; 

shuffle the fourth data set on the second key; and 
distributedly join the second data set and the fourth data set on the second key to generate and store a fifth data set that can be used by the database system to generate a result for the received request. 

2. The computer system in accordance with claim 1, wherein the received request further includes a request to perform an aggregation with respect to data included within the first data set and the second data set. 

3. The computer system in accordance with claim 2, wherein the fifth data set is shuffled on the first key. 

4. The computer system in accordance with claim 3, wherein the requested aggregation is performed after the fifth data set is shuffled on the first key. 

5. The computer system in accordance with claim 4, wherein the requested aggregation comprises performing an average of particular data included within the second data set. 

6. The computer system in accordance with claim 1, wherein the first data set comprises a dimension data set. 

7. The computer system in accordance with claim 1, wherein the second data set comprises a fact data set. 

8. The computer system in accordance with claim 1, wherein the third data set is generated at a runtime of the received request. 

9. The computer system in accordance with claim 1, wherein the first data set includes at least two terabytes of data. 



11. A method, implemented at a computer system that includes one or more processors, for distributedly joining two data sets, comprising: receiving a request at a database system to distributedly join a first data set and a second data set on a first key, wherein the first data set is shuffled on the first key and the second data set is shuffled on a second key, the second data set comprising a data set that is larger than the first data set; generating and storing a third data set that is both shuffled on the first key and includes data associated with the second key; distributedly joining the first data set and the third data set on the first shuffle key to generate a fourth data set that is shuffled on the first key and includes data associated with both the first key and the second key; shuffling the fourth data set on the second key; and distributedly joining the second data set and the fourth data set on the second key to generate and store a fifth data set that can be used by the database system to generate a result for the received request. 

12. The method in accordance with claim 11, wherein the received request further includes a request to perform an aggregation with respect to data included within the first data set and the second data set. 

13. The method in accordance with claim 12, wherein the fifth data set is shuffled on the first key. 

14. The method in accordance with claim 13, wherein the requested aggregation is performed after the fifth data set is shuffled on the first key. 

15. The method in accordance with claim 14, wherein the requested aggregation comprises performing an average of particular data included within the second data set. 



17. The method in accordance with claim 11, wherein the second data set comprises a fact data set. 

18. The method in accordance with claim 11, wherein the third data set is generated at a runtime of the received request. 

19. The method in accordance with claim 11, wherein the second data set includes at least 80 terabytes of data. 

20. A computer program product comprising one or more hardware storage devices having stored thereon computer-executable instructions that are executable by one or more processors of a computer system to distributedly join two data sets, the computer-executable instructions including instructions that are executable to cause the computer system to perform at least the following: receive a request at a database system to distributedly join a first data set and a second data set on a first key, wherein the first data set is shuffled on the first key and the second data set is shuffled on a second key, the second data set comprising a data set that is larger than the first data set; generate and store a third data set that is both shuffled on the first key and includes data associated with the second key; distributedly join the first data set and the third data set on the first shuffle key to generate a fourth data set that is shuffled on the first key and includes data associated with both the first key and the second key; shuffle the fourth data set on the second key; and distributedly join the second data set and the fourth data set on the second key to generate and store a fifth data set that can be used by the database system to generate a result for the received request. 


one or more processors; and 
one or more computer-readable storage media having stored thereon computer-executable instructions that are executable by the one or more processors to cause the computer system to distributedly join two data sets, the computer-executable instructions including instructions that are executable to cause the computer system to perform at least the following: 

receive a request to distributedly join a first data set and a second data set on a first key, wherein the first data set is shuffled on the first key and the second data set is shuffled on a second key, the second data set comprising a data set that is larger than the first data set; 


generate a third data set that is both shuffled on the first key and includes data associated with the second key; 
distributedly join the first data set and the third data set on the first shuffle key to generate a fourth data set that is shuffled on the first key and includes data associated with both the first key and the second key; 

shuffle the fourth data set on the second key; and 
distributedly join the second data set and the fourth data set on the second key to generate a fifth data set that can be used to generate a result for the received request. 

 
2. The computer system in accordance with claim 1, wherein the received request further includes a request to perform an aggregation with respect to data included within the first data set and the second data set. 

    3. The computer system in accordance with claim 2, wherein the fifth data set is shuffled on the first key. 

    4. The computer system in accordance with claim 3, wherein the requested aggregation is performed after the fifth data set is shuffled on the first key. 

    5. The computer system in accordance with claim 4, wherein the requested aggregation comprises performing an average of particular data included within the second data set. 

    6. The computer system in accordance with claim 1, wherein the first data set comprises a dimension data set. 

    7. The computer system in accordance with claim 1, wherein the second data set comprises a fact data set. 

    8. The computer system in accordance with claim 1, wherein the third data set is generated at a runtime of the received request. 

    9. The computer system in accordance with claim 1, wherein the first data set includes at least two terabytes of data. 



  11. A method, implemented at a computer system that includes one or more processors, for distributedly joining two data sets, comprising: receiving a request to distributedly join a first data set and a second data set on a first key, wherein the first data set is shuffled on the first key and the second data set is shuffled on a second key, the second data set comprising a data set that is larger than the first data set; generating a third data set that is both shuffled on the first key and includes data associated with the second key; distributedly joining the first data set and the third data set on the first shuffle key to generate a fourth data set that is shuffled on the first key and includes data associated with both the first key and the second key; shuffling the fourth data set on the second key; and distributedly joining the second data set and the fourth data set on the second key to generate a fifth data set that can be used to generate a result for the received request. 

    


12. The method in accordance with claim 11, wherein the received request further includes a request to perform an aggregation with respect to data included within the first data set and the second data set. 

  13. The method in accordance with claim 12, wherein the fifth data set is shuffled on the first key. 

    14. The method in accordance with claim 13, wherein the requested aggregation is performed after the fifth data set is shuffled on the first key. 

   
15. The method in accordance with claim 14, wherein the requested aggregation comprises performing an average of particular data included within the second data set. 



    17. The method in accordance with claim 11, wherein the second data set comprises a fact data set. 

    18. The method in accordance with claim 11, wherein the third data set is generated at a runtime of the received request. 

    19. The method in accordance with claim 11, wherein the second data set includes at least 80 terabytes of data. 

   
 20. A computer program product comprising one or more hardware storage devices having stored thereon computer-executable instructions that are executable by one or more processors of a computer system to distributedly join two data sets, the computer-executable instructions including instructions that are executable to cause the computer system to perform at least the following: receive a request to distributedly join a first data set and a second data set on a first key, wherein the first data set is shuffled on the first key and the second data set is shuffled on a second key, the second data set comprising a data set that is larger than the first data set; generate a third data set that is both shuffled on the first key and includes data associated with the second key; distributedly join the first data set and the third data set on the first shuffle key to generate a fourth data set that is shuffled on the first key and includes data associated with both the first key and the second key; shuffle the fourth data set on the second key; and distributedly join the second data set and the fourth data set on the second key to generate a fifth data set that can be used to generate a result for the received request. 


Claim Rejections - 35 USC § 103

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 7, 11-15, 17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Xu (US 20130124501 A1) in view of Kavulya et al. (US 20170185648 A1).
Regarding claims 1, similar claim 11 and claim 20, Xu discloses a computer system comprising: one or more processors; and one or more computer-readable storage media having stored thereon computer-executable instructions that are executable by the one or more processors to cause the computer system to distributedly join two data sets (¶[0037], [0048]-[0049], Xu, i.e., joining two data sets), the computer-executable instructions including instructions that are executable to cause the computer system to perform at least the following: 
receive a request to distributedly join a first data set and a second data set on a first key (¶[0037], [0048]-[0049], Xu, i.e., joining two data sets), wherein the first data set is shuffled on the first key and the second data set is shuffled on a second key (¶[0037], [0048]-[0049], Xu, i.e., generates a first data set by joining the contents of the AMP’s spools Spool.sup.R.sub.redis and Spool.sup.S.sub.redis. Each AMP generates a second data set by joining the contents of the AMP's spools Spool.sup.R.sub.local and Spool.sup.S.sub.dup.), the second data set comprising a data set that is larger than the first data set (¶[0037], [0048]-[0050], Xu, i.e., AMP generates a second data set by joining the contents of the AMP's spools Spool.sup.R.sub.local and Spool.sup.S.sub.dup. Each AMP then hash redistributes the second data set on the attribute Table1.a and places rows received from other AMPs resulting from redistribution of an AMP's second data set in a spool (designated Spool.sub.set2, wherein skewed values of a join column of a larger table involved in the join operation ); 
generate a third data set that is both on the first key and includes data associated with the second key (¶[0037], [0048]-[0050], Xu); distributedly join the first data set and the third data set on the first shuffle key to generate a fourth data set that is on the first key and includes data associated with both the first key and the second key (¶[0037], [0048]-[0050], Xu); and distributedly join the second data set and the fourth data set on the second key to generate a fifth data set that can be used to generate a result for the received request (¶[0037], [0048]-[0050], Xu). 
Xu, however, does not explicitly disclose wherein the joining that the first data set is shuffled on the first key and the second data set is shuffled on a second key, and the second data set comprising a data set that is larger than the first data set.
Kavulya discloses the joining that the first data set is shuffled on the first key and the second data set is shuffled on a second key (¶[0024]-[0026], Kavulya), and the second data set comprising a data set that is larger than the first data set (¶[0018], [0024]-[0026], Kavulya, i.e., Optimizing skewed joins in bigger data such that if the probability that one or more keys will not fit in memory is greater than a threshold percentage, e.g., 75%, a skewed join may be used ).
It would have been obvious to a person having ordinary skill in the art at the time the invention was made to modify Xu with the teachings of Kavulya to join predicates on shuffled keys. It is efficiency to collect and process large data sets, specially, when the amount or size of data is larger than the memory capacity or storage capacity of a machine e.g. servers or computers (¶[0002], Kavulya). Moreover, both references disclose features that are directed to 
Regarding claims 2 and 12, Xu/Kavulya combination discloses wherein the received request further includes a request to perform an aggregation with respect to data included within the first data set and the second data set (¶[0024]-[0026], Kavulya). 
Regarding claims 3 and 13, Xu/Kavulya combination discloses wherein the fifth data set is shuffled on the first key (¶[0024]-[0026], Kavulya). 
Regarding claims 4 and 14, Xu/Kavulya combination discloses wherein the requested aggregation is performed after the fifth data set is shuffled on the first key (¶[0024]-[0026], Kavulya). 
Regarding claims 5 and 15, Xu/Kavulya combination discloses wherein the requested aggregation comprises performing an average of particular data included within the second data set (¶[0018], [0025]-[0026], Kavulya). 
Regarding claims 7 and 17, Xu/Kavulya combination discloses wherein the second data set comprises a fact data set (¶[0037], [0048]-[0050], Xu). 
Claims 6, 8, 16 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Xu (US 20130124501 A1) in view of Kavulya et al. (US 20170185648 A1) and further in view of Lam .
Regarding claims 6 and 16, Xu/Kavulya combination discloses all of the claimed limitations as discussed above, except wherein the first data set comprises a dimension data set. 
Lam discloses wherein the first data set comprises a dimension data set (¶[0029, [0031, Lam, i.e., the data set from which the pivot tables can be generated may be a data set formed by joining two or more data sets. This is referred to as flattening the data set. By flattening the data set, a larger number of dimensions can be used to generate pivot tables from existing pivot tables). 
It would have been obvious to a person having ordinary skill in the art at the time the invention was made to modify Xu with the teachings of Kavulya to join predicates on shuffled keys. It is efficiency to collect and process large data sets, specially, when the amount or size of data is larger than the memory capacity or storage capacity of a machine e.g. servers or computers (¶[0002], Kavulya). Moreover, references disclose features that are directed to analogous art and the same field of endeavor, such as performing join operations on data sets. This close relationship between Xu/ Kavulya and Lam highly suggests an expectation of success. 
Regarding claims 8 and 18, Xu/Kavulya combination discloses all of the claimed limitations as discussed above, except wherein the third data set is generated at a runtime of the received request. 
Lam discloses wherein the third data set is generated at a runtime of the received request (¶[0038] and [0074], Lam). It would have been obvious to a person having ordinary skill in the art at the time the invention was made to modify Xu with the teachings of Kavulya to join predicates on shuffled keys. It is efficiency to collect and process large data sets, specially, when the amount or size of data is larger than the memory capacity or storage capacity of a machine e.g. servers or computers (¶[0002], Kavulya). 
Claims 9-10 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Xu (US 20130124501 A1) in view of Kavulya et al. (US 20170185648 A1) and further in view of Vemuri et al. (US 20150261804 A1).
Regarding claim 9, Xu/Kavulya combination discloses all of the claimed limitations as discussed above, except wherein the first data set includes at least two terabytes of data (¶[0041] and [0048], Vemuri). 
It would have been obvious to a person having ordinary skill in the art at the time the invention was made to modify Xu with the teachings of Kavulya to join predicates on shuffled keys. It is efficiency to collect and process large data sets, specially, when the amount or size of data is larger than the memory capacity or storage capacity of a machine e.g. servers or computers (¶[0002], Kavulya). Moreover, references disclose features that are directed to analogous art and the same field of endeavor, such as performing join operations on data sets. This close relationship between Xu/ Kavulya and Vemuri highly suggests an expectation of success. 
Vemuri discloses wherein the first data set includes at least two terabytes of data
Regarding claim 10, Xu/Kavulya combination discloses wherein the second data set includes at least 50 terabytes of data (¶[0041] and [0048], Vemuri). 
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Klots et al. (US 20160253402 A1)/(US 10223437 B2) disclose ADAPTIVE DATA REPARTITIONING AND ADAPTIVE DATA REPLICATION.
Derby et al. (US 20160078031 A1)/(US 10042876 B2) disclose SORT-MERGE-JOIN ON A LARGE ARCHITECTED REGISTER FILE.
Kavulya et al. (US 20170185648 A1)/(US 10585889 B2) disclose Optimizing skewed joins in big data.
Ueda et al. (US 20140101213 A1)/(US 10095699 B2) disclose A slave computer reads a plurality of input files that have different formats and generates, for each of the input files, an intermediate file that has added thereto, as a join key, data in a type of column that is common to the input files.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HANH B THAI whose telephone number is (571)272-4029.  The examiner can normally be reached on Mon-Friday 7-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tony Mahmoudi can be reached on 571-272-4078.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/HANH B THAI/Primary Examiner, Art Unit 2163                                                                                                                                                                                                        

August 28, 2021