DETAILED ACTION
Claims 1-16 are pending in this office action.
Response to Arguments
Applicant’s argument filed 8/29/2022, have been fully considered and are persuasive.  The final rejection was mailed on 4/5/2022 has been withdrawn. 
Applicant’s arguments with respect to claim(s) have been considered but are moot in the new ground of rejection.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 6, 8-10, 13, 15 are rejected under 35 U.S.C. 103 as being unpatentable over Al-Kateb et al (or hereinafter “AI”) (US 20110313977) in view of Menezes et al (or hereinafter “Me”) (US 10303797) and Shi et al (or hereinafter “Shi”) (US 20090259618).
As to claim 1, Al teaches a method for generating a random sample of data elements from multiple data sources (abstract, paragraph 87), the method comprising: 
“receiving, using a computer processor, from each of said multiple data sources, a sample of data elements” as receiving, using a processor (paragraph 17), from each stream of multiple streams as sources, tuples (paragraphs 15, 65).  A tuple of tuples is not a sample of data elements;
 “for each one of the multiple data sources, establishing in a memory a respective intermediate sampling reservoir and populating, using said computer processor, each respective intermediate sampling reservoir with the sample of data elements received from said one of the multiple data sources” as for each stream of multiple streams as each one of the multiple data sources, establishing in a memory respective sampling reservoirs e.g., (R.sub.1, R.sub.2, . . . , Rv) of v input streams (S.sub.1, S.sub.2, . . . , Sv) (paragraphs 15, 60, 63) and filling or adding, using a processor (paragraphs, 19, 44), each sampling reservoir of sampling reservoirs e.g., (R.sub.1, R.sub.2, . . . , Rv)  with a tuple received from one of streams (paragraphs 65-66: algorithm 3: steps 16-21; paragraph 6: algorithm 1).
The tuple is not the sample of data elements.  Each sampling reservoir is represented as each respective intermediate sampling reservoir.
In particularly:  Algorithm 3: Inputs streams; At step 16: for each Ri belong (Lreduced = set of all Ri +  Lenlarged =set of all Ri) do:
Select tuples from the incoming mi(t) to fill a reservoir Ri using algorithm 1  (paragraph 58), wherein mi(t) = number of tuples to be seen from Si, starting from time in point (t) (paragraph 65).
Algorithm 1:  for each tuple arriving from places all tuples from an input stream, add the tuple to  the reservoir until the reservoir  becomes full (paragraph 6).
“establishing a final sampling reservoir” as establishing in a memory respective reservoirs e.g., (R.sub.1, R.sub.2, . . . , Rv) of v input streams (S.sub.1, S.sub.2, . . . , Sv) (paragraphs 15, 60, 63).  Sampling reservoir Rv is represented as final sampling reservoir.
Al does not explicitly teach the claimed limitations:
a sample of data elements; the sample of data elements.
randomly selecting data elements by said computer processor from each one of said respective intermediate sampling reservoirs and populating said final sampling reservoir with said randomly selected data elements.
Me teaches the claimed limitations:
a sample of data elements; the sample of data elements (as receiving, by a processor of storage system 104, from each of multiple data streams e.g., files, segments as data elements and sampling one segment S for every R segments in the file by comparing subset of bits in a content based-segment fingerprint to sampling patent (fig. 1, 4-5, col. 9, lines 10-20; col. 10, lines 57-67). Each sampled segment of segments is represented as a sample of data elements);
 “receiving, using a computer processor, from each of said multiple data sources, a sample of data elements” as receiving, by a processor of storage system 104, from each of multiple clients 101, 102 as data sources, a file of files as a sample of data elements (col. 6, lines 60-65; col. 7, lines 1-10, fig. 1, 4-5, col. 9, lines 10-20; col. 10, lines 57-67) or
receiving, by a processor of storage system 104, from each of multiple data streams e.g., files, segments as data elements and sampling one segment S for every R segments in the file by comparing subset of bits in a content based-segment fingerprint to sampling patent (fig. 1, 4-5, col. 9, lines 10-20; col. 10, lines 57-67). Each sampled segment of segments is represented as a sample of data elements;
“for each one of the multiple data sources, establishing in a memory a  respective intermediate sampling reservoir and populating, using said computer processor, each respective intermediate sampling reservoir with the sample of data elements received from said one of the multiple data sources” as for each client of clients as data sources, establishing in a memory and  storing using a processor with a file of files received from each client of clients col. 6, lines 60-65; col. 7, lines 1-10, fig. 1) into a respective  bitmap of bitmaps inserting sampled segment S of a file into a bitmap (figs. 4-5, 9, col. 9, lines 10-20; col. 10, lines 57-67; col. 11, lines 1-6).  A respective bitmap is represented as each respective intermediate sampling reservoir or
for each file of files as multiple data sources, establishing in a memory a respective particular bit location (col. 3, lines 1-15, fig. 7) and  storing using a processor with segment of segments received from each file of files  into a respective  particular bit location of  bitmap of bitmaps inserting sampled segment S of a file into a bitmap (figs. 4-5, 9, col. 9, lines 10-20; col. 10, lines 57-67; col. 11, lines 1-6).  
The files are presented as multiple data sources.  A respective particular bit location of a bitmap of bitmaps is represented as intermediate sampling reservoir.
In particularly, the process 400 continues at 404 to generate the bitmaps B.sub.F representing the unique segments belonging to each of the files F1, .  . . FN (FIG. 5).  Note that bitmap B.sub.F for a file F may have already been computed and stored on persistent storage.  If so, the process 404 fetches B.sub.F from persistent storage instead of having to generate it again Once the bitmaps B.sub.F are fetched or generated, the process 400 continues at 406 to find, for each file F1, .  . . , FN, the smallest K offsets [O.sub.1, .  . . , O.sub.K] in B.sub.F for which a segment is present, i.e. a segment that belongs to file F: B.sub.F[O.sub.i]=1, for all i, 1.ltoreq.i.ltoreq.K Once the offsets have been found, the process 400 assembles the found offsets into n individual list of K found offset numbers [O.sub.1, .  . . , O.sub.K] for each file F1, .  . . , FN: F[O.sub.1, .  . . ,O.sub.K], for each of N files F,F1, .  . . ,FN (col. 9, lines 10-20).
The process 500 consistently samples segments without being affected by the frequency with which the segment belongs to files in the storage system.  For example, if segment s1 belongs to files F1, .  . . . FN with a higher frequency than segment s2, content-based sampling avoids the possibility that segment s1 is sampled more often than segment s2 (col. 10, lines 55-67).
After a segment of file F is sampled, at 506 the process 500 inserts the sampled segment into bitmap B.sub.F using a corresponding bloom filter insertion hash function.  The process 500 continues at 508 repeating traversing and sampling until traversal is complete.  The process 500 can be performed at either the time the file F is written to the storage system or on demand, i.e. when the extent of similarity between the file F and N other files F1, .  . . , FN needs to be estimated (process 400, FIG. 4) (col. 11, lines 1-10);
“establishing a final sampling reservoir” as generating cluster definition Fk as a final sampling reservoir in a memory (figs. 3-5, 9, col. 7, lines 60-67, col. 8, lines 1-10; col. 9, lines 10-67; col. 10, lines 60-67).
Al and Me disclose a method for storing data into a storage.  These references are the same field with application’s endeavor. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply Me’s teaching to Al’s system in order to store files in clusters of a distributed system efficiently, to reduce representation space in reservoir, to reduce the number of segments selected for insertion into reservoirs and further to reduce greatly required network traffic or bandwidth and the processing resources.
Shi teaches the claimed limitation:
“randomly selecting data elements by said computer processor from each one of said respective intermediate sampling reservoirs and populating said final sampling reservoir with said randomly selected data elements” as randomly selecting rows or percentages of rows, by processing device that includes a processor (fig. 1, paragraph 22)  from tables (fig. 4) as intermediate sampling reservoirs and storing the percentages of rows or the selected rows in shadow database as said final sampling reservoir (figs. 4-7, paragraphs 40-43). 
In particularly, when a loop exists, tables that are included in the loop may have more than the target iteration slicing percentage of rows copied to corresponding tables in the shadow database. In some cases, all data from tables included in the loop may be copied to the shadow database in a single round. Making the target iteration slicing percentage smaller may avoid having all the data from the tables included in the loop copied to the shadow database in a single round (paragraph 39).
The processing device may then find a first driving table of the relational database by referring to the representation of the created connected graph (act 408). Approximately, the target iteration slicing percentage of rows may be randomly selected from the driving table and stored in a corresponding table in the shadow database (act 410). The process previously described, with respect to FIG. 7, or another process, may be executed by the processing device to randomly selected rows from the driving table and store the randomly selected rows in the corresponding table in the shadow database (paragraph 40). 
If, during act 414 (FIG. 4), the processing device determines that a next related table is found, then the processing device may select rows of tables of the relational database, related to the selected and copied rows of the driving table, either directly or indirectly, and may copy the selected rows of the tables to corresponding tables of the shadow database (act 602; FIG. 6) (paragraph 43);
“establishing a final sampling reservoir” as creating a shadow database (abstract).
Al and Shi disclose a method for randomly selecting data and storing the selected data into a storage.  These references are the same field with application’s endeavor. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply Shi’s teaching to Al’s system in order to copy rows of the relational database to a shadow database, while preserving referential integrity among tables of the created shadow database and further to avoid all data from tables connected in a loop may be copied to the shadow database in a single iteration.

As to claims 2, 9, AI and Me teach the claimed limitation “wherein each of said respective intermediate and final reservoirs has an equivalent size” as a respective particular bit locations in each bitmap of bitmaps have uniform size (Me: col. 3, lines 15-30; col. 8, lines 35-67; AI: paragraph 65).

As to claims 3, 10, AI and Me  teach the claimed limitation “wherein said multiple data sources comprise data storage devices within a distributed data processing system” as data sources comprise data storage devices within a distributed data processing system (Me: figs. 1-3)

As to claims 4, 11, Me and Shi teach the claimed limitation “wherein said distributed data processing system comprises a relational data processing system” as distributed system includes relational data processing system ( Me: figs. 8-9, col. 114, lines 40-67; Shi: abstract: paragraphs 5, 7).

Claim 6, has the same claimed limitation subject matter as discussed in claim 1; thus claim 6 is rejected under the same reason as discussed in claim 1.  In addition, 
AI teaches a method for generating a random sample of data elements from multiple data streams, the method comprising
“receiving, using a computer processor, from each of said multiple data streams, a sample of data elements” as receiving, using a processor (paragraph 17), from each stream of a multiple streams, tuples (paragraphs 15, 65).  A tuple of tuples is not a sample of data elements;
“for each one of the multiple data streams, establishing in a memory a respective intermediate sampling reservoir of an equivalent size and populating using said computer processor the respective intermediate sampling reservoir with the sample of data elements received from said one of the multiple data streams” as for each stream of multiple streams as each one of the multiple data sources, establishing in a memory respective sampling reservoirs of a size e.g., (R.sub.1, R.sub.2, . . . , Rv) of v input streams (S.sub.1, S.sub.2, . . . , Sv) (paragraphs 15, 60, 63) and filling or adding using a processor (paragraphs, 19, 44), each sampling reservoir of sampling reservoirs  e.g., (R.sub.1, R.sub.2, . . . , Rv)  with a tuple of tuples received from one of the streams (paragraphs 65-66: algorithm 3: steps 16-21; paragraph 6: algorithm 1).
The size of any one or more of the plurality of sampling reservoirs (paragraph 15) indicates each sampling reservoir of the sampling reservoirs has a same size as equivalent size.
The tuple is not the sample of data elements.
In particularly:  Algorithm 3: Inputs streams; At step 16: for each Ri belong (Lreduced = set of all Ri +  Lenlarged =set of all Ri) do:
Select tuples from the incoming mi(t) to fill a reservoir Ri using algorithm 1  (paragraph 58), wherein mi(t) = number of tuples to be seen from Si, starting from time in point (t) (paragraph 65).
Algorithm 1:  for each tuple arriving from places all tuples from an input stream, add the tuple to  the reservoir until the reservoir  becomes full (paragraph 6).
“establishing in memory a final sampling reservoir of said equivalent size” as establishing in a memory respective sampling reservoirs of a size e.g., (R.sub.1, R.sub.2, . . . , Rv) of v input streams (S.sub.1, S.sub.2, . . . , Sv) (paragraphs 15, 60, 63).  The size of any one or more of the plurality of sampling reservoirs e.g., (R.sub.1, R.sub.2, . . . , Rv) (paragraph 15) indicates each sampling reservoir of the sampling reservoirs has a same size as equivalent size.
	Sampling Reservoir e.g., Rv of the same size is represented as a final sampling reservoir of equivalent size.
Al does not explicitly teach the claimed limitations:
a sample of data elements; the sample of data elements
randomly selecting by said computer processor data elements from each one of said respective intermediate sampling reservoirs and populating said final sampling reservoir with said randomly selected data elements.
Me teaches the claimed limitations:
a sample of data elements; the sample of data elements (as receiving, by a processor of storage system 104, from each of multiple data streams e.g., files, segments as data elements and sampling one segment S for every R segments in the file by comparing subset of bits in a content based-segment fingerprint to sampling patent (fig. 1, 4-5, col. 9, lines 10-20; col. 10, lines 57-67). Each sampled segment of segments is represented as a sample of data elements);
“for each one of the multiple data streams, establishing in a memory a respective intermediate sampling reservoir of an equivalent size and populating using said computer processor the respective intermediate sampling reservoir with the sample of data elements received from said one of the multiple data streams” as for each  group of files as multiple data streams, establishing in a memory a respective a  bitmap of a uniform size as  equivalent size (col. 8, lines 35-67) and  storing using a processor with segment of segments received from each bitmap of bitmaps e.g., inserting sampled segment S of a file into a bitmap (figs. 4-5, 9, col. 9, lines 10-20; col. 10, lines 57-67; col. 11, lines 1-6) or
for each file of files as multiple data streams, establishing in a memory a respective particular bit location (col. 3, lines 1-15, fig. 7) in a  bitmap of a uniform size as  equivalent size (col. 8, lines 35-67) and  storing using a processor with segment of segments received from each file of files  in bitmap of bitmaps e.g., inserting sampled segment S of a file into a bitmap (figs. 4-5, 9, col. 9, lines 10-20; col. 10, lines 57-67; col. 11, lines 1-6);
 “establishing in memory a final sampling reservoir of said equivalent size” as generating cluster definition Fk as a final sampling reservoir of a uniform size (figs. 3-5, 9, col. 7, lines 60-67, col. 8, lines 1-10) or generating cluster definition Fk as a final sampling reservoir of a uniform size (col. 3, lines 1-15, fig. 7);
“receiving, using a computer processor, from each of said multiple data streams, a sample of data elements” as receiving, by a processor of storage system 104, from each group of files as each of multiple data streams, a file of files (figs.1, 6, col. 3, lines 60-67; col. 7, lines 10; col. 11, lines 15-35); or 
receiving, by a processor of storage system 104, from each of multiple data streams, segments as data elements and sampling one segment S for every R segments in the file by comparing subset of bits in a content based-segment fingerprint to sampling patent (fig. 1, 4-5, col. 9, lines 10-20; col. 10, lines 57-67). Each sampled segment of segments is represented as a sample of data elements;
“generating a random sample of data elements from multiple data streams” as (paragraphs 2-3).
Al and Me disclose a method for randomly selecting a tuple in the reservoir.  These references are the same field with application’s endeavor. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply Me’s teaching to Al’s system in order to store files in clusters of a distributed system efficiently, to reduce representation space in reservoir, to reduce the number of segments selected for insertion into reservoirs and further to reduce greatly required network traffic or bandwidth and the processing resources.
Shi teaches the claimed limitation:
“randomly selecting by said computer processor data elements from each one of said respective intermediate sampling reservoirs and populating said final sampling reservoir with said randomly selected data elements” as randomly selecting rows or percentages of rows, by processing device that includes a processor (fig. 1, paragraph 22)  from tables (fig. 4) as intermediate sampling reservoirs and storing the percentages of rows or the selected rows in shadow database as said final sampling reservoir (figs. 4-7, paragraphs 40-43). 
In particularly, when a loop exists, tables that are included in the loop may have more than the target iteration slicing percentage of rows copied to corresponding tables in the shadow database. In some cases, all data from tables included in the loop may be copied to the shadow database in a single round. Making the target iteration slicing percentage smaller may avoid having all the data from the tables included in the loop copied to the shadow database in a single round (paragraph 39).
The processing device may then find a first driving table of the relational database by referring to the representation of the created connected graph (act 408). Approximately, the target iteration slicing percentage of rows may be randomly selected from the driving table and stored in a corresponding table in the shadow database (act 410). The process previously described, with respect to FIG. 7, or another process, may be executed by the processing device to randomly selected rows from the driving table and store the randomly selected rows in the corresponding table in the shadow database (paragraph 40). 
If, during act 414 (FIG. 4), the processing device determines that a next related table is found, then the processing device may select rows of tables of the relational database, related to the selected and copied rows of the driving table, either directly or indirectly, and may copy the selected rows of the tables to corresponding tables of the shadow database (act 602; FIG. 6) (paragraph 43).
Al and Shi disclose a method for randomly selecting data and storing the selected data into a storage.  These references are the same field with application’s endeavor. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply Shi’s teaching to Al’s system in order to copy rows of the relational database to a shadow database, while preserving referential integrity among tables of the created shadow database and further to avoid all data from tables connected in a loop may be copied to the shadow database in a single iteration.

As to claim 8, AI teaches system for generating a random sample of data elements from multiple data sources (abstract, paragraph 87), the system comprising: 
“a computer processor for receiving from each of said multiple data sources, a sample of data elements” as receiving, using a processor (paragraph 17), from each stream of a plurality of streams, tuples (paragraphs 15, 65).  A tuple of tuples is not a sample of data elements;
“a respective intermediate sampling reservoir established within a computer memory for each one of the multiple data sources, each one of said respective intermediate sampling reservoirs being populated by said computer processor with the sample of data elements received from said one of the multiple data sources” as for each stream of multiple streams as multiple data sources, establishing in a memory respective sampling reservoirs e.g., (R.sub.1, R.sub.2, . . . , Rv) of v input streams (S.sub.1, S.sub.2, . . . , Sv) (paragraphs 15, 60, 63) and fill or add using a processor (paragraphs, 19, 44), each sampling reservoir of sampling reservoirs e.g., (R.sub.1, R.sub.2, . . . , Rv)  with a tuple of tuples received from one of the streams (paragraphs 65-66: algorithm 3: steps 16-21; paragraph 6: algorithm 1).
The tuple of the tuples is not the sample of data elements.
In particularly:  Algorithm 3: Inputs streams; At step 16: for each Ri belong (Lreduced = set of all Ri +  Lenlarged =set of all Ri) do:
Select tuples from the incoming mi(t) to fill a reservoir Ri using algorithm 1  (paragraph 58), wherein mi(t) = number of tuples to be seen from Si, starting from time in point (t) (paragraph 65).
Algorithm 1:  for each tuple arriving from places all tuples from an input stream, add the tuple to  the reservoir until the reservoir  becomes full (paragraph 6);
“a final sampling reservoir established within said computer memory” as establishing in a memory respective sampling reservoirs of a size e.g., (R.sub.1, R.sub.2, . . . , Rv) of v input streams (S.sub.1, S.sub.2, . . . , Sv) (paragraphs 15, 60, 63). 	Reservoir e.g., sampling Rv is represented as a final sampling reservoir.
AI does not explicitly teach the claimed limitations:
a sample of data elements; the sample of data elements;
 said final sampling reservoir being populated by said computer processor with a random selection of data elements from each one of said intermediate sampling reservoirs.
Me teaches the claimed limitations:
a sample of data elements; the sample of data elements (as receiving, by a processor of storage system 104, from each of multiple data streams e.g., files, segments as data elements and sampling one segment S for every R segments in the file by comparing subset of bits in a content based-segment fingerprint to sampling patent (fig. 1, 4-5, col. 9, lines 10-20; col. 10, lines 57-67). Each sampled segment of segments is represented as a sample of data elements);
 “a computer processor for receiving from each of said multiple data sources, a sample of data elements” as receiving, by a processor of storage system 104, from each of clients,  multiple data streams e.g., files, segments as data elements (col. 3, lines 1-15, fig. 7) and sampling one segment S for every R segments in the file by comparing subset of bits in a content based-segment fingerprint to sampling patent (fig. 1, 4-5, col. 9, lines 10-20; col. 10, lines 57-67). Each sampled segment of segments is represented as a sample of data elements;
“ a respective intermediate sampling reservoir established within a computer memory for each one of the multiple data sources, each one of said respective intermediate sampling 3Teradata Docket No. 15-1039 reservoirs being populated by said computer processor with the sample of data elements received from said one of the multiple data sources” as for each client of clients as multiple data sources, establishing in a memory (col. 3, lines 1-15, fig. 7) in a bitmap and  storing using a processor with each file of files from each client of clients in bitmap of bitmaps e.g., inserting sampled segment S of a file into a bitmap (figs. 4-5, 9, col. 9, lines 10-20; col. 10, lines 57-67; col. 11, lines 1-6) or
for each file of files as multiple data sources, establishing in a memory a respective particular bit location (col. 3, lines 1-15, fig. 7) in a bitmap and  storing using a processor with segment of segments received from each file of files  in bitmap of bitmaps e.g., inserting sampled segment S of a file into a bitmap (figs. 4-5, 9, col. 9, lines 10-20; col. 10, lines 57-67; col. 11, lines 1-6);
“a final sampling reservoir established within said computer memory” as generating cluster definition Fk as a final sampling reservoir (figs. 3-5, 9, col. 7, lines 60-67, col. 8, lines 1-10) or generating cluster definition Fk as a final sampling reservoir (col. 3, lines 1-15, fig. 7);
“generating a random sample of data elements from multiple data sources” as (paragraphs 2-3).
Al and Me disclose a method for randomly selecting a tuple in the reservoir.  These references are the same field with application’s endeavor. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply Me’s teaching to Al’s system in order to store files in clusters of a distributed system efficiently, to reduce representation space in reservoir, to reduce the number of segments selected for insertion into reservoirs and further to reduce greatly required network traffic or bandwidth and the processing resources.
Shi teaches the claimed limitation
 “said final sampling reservoir being populated by said computer processor with a random selection of data elements from each one of said intermediate sampling reservoirs” as randomly selecting rows or percentages of rows, by processing device that includes a processor (fig. 1, paragraph 22)  from tables (fig. 4) as intermediate sampling reservoirs and storing the percentages of rows or the selected rows in shadow database as said final sampling reservoir (figs. 4-7, paragraphs 40-43). 
In particularly, when a loop exists, tables that are included in the loop may have more than the target iteration slicing percentage of rows copied to corresponding tables in the shadow database. In some cases, all data from tables included in the loop may be copied to the shadow database in a single round. Making the target iteration slicing percentage smaller may avoid having all the data from the tables included in the loop copied to the shadow database in a single round (paragraph 39).
The processing device may then find a first driving table of the relational database by referring to the representation of the created connected graph (act 408). Approximately, the target iteration slicing percentage of rows may be randomly selected from the driving table and stored in a corresponding table in the shadow database (act 410). The process previously described, with respect to FIG. 7, or another process, may be executed by the processing device to randomly selected rows from the driving table and store the randomly selected rows in the corresponding table in the shadow database (paragraph 40). 
If, during act 414 (FIG. 4), the processing device determines that a next related table is found, then the processing device may select rows of tables of the relational database, related to the selected and copied rows of the driving table, either directly or indirectly, and may copy the selected rows of the tables to corresponding tables of the shadow database (act 602; FIG. 6) (paragraph 43);
“a final sampling reservoir established within said computer memory” as (abstract).
Al and Shi disclose a method for randomly selecting data and storing the selected data into a storage.  These references are the same field with application’s endeavor. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply Shi’s teaching to Al’s system in order to copy rows of the relational database to a shadow database, while preserving referential integrity among tables of the created shadow database and further to avoid all data from tables connected in a loop may be copied to the shadow database in a single iteration.

As to claim 13, AI teaches a system for generating a random sample of data elements from multiple data streams (abstract, paragraph 87), comprising:
“ a computer processor for receiving a sample of data elements from each one of said multiple data streams” as receiving, using a processor (paragraph 17), from each stream of multiple streams, tuples (paragraphs 15, 65).  A tuple of the tuples is not a sample of data elements;
 “a respective intermediate sampling reservoir established within a computer memory for each one of the multiple data streams, each one of said respective intermediate sampling reservoirs having an equivalent size and being populated by said computer processor with the sample of data elements received from said one of the multiple data streams” as for each stream of a multiple streams, establishing in a memory respective sampling reservoirs of a size e.g., (R.sub.1, R.sub.2, . . . , Rv) of v input streams (S.sub.1, S.sub.2, . . . , Sv) (paragraphs 15, 60, 63) and fill or add using a processor (paragraphs, 19, 44), each reservoir of sampling reservoirs  e.g., (R.sub.1, R.sub.2, . . . , Rv)  with a tuple of the tuples received from one of the streams (paragraphs 65-66: algorithm 3: steps 16-21; paragraph 6: algorithm 1).
The size of any one or more of the plurality of sampling reservoirs (paragraph 15) indicates each sampling reservoir of sampling reservoirs has a same size as equivalent size.
The tuple is not the sample of data elements.
In particularly:  Algorithm 3: Inputs streams; At step 16: for each Ri belong (Lreduced = set of all Ri +  Lenlarged =set of all Ri) do:
Select tuples from the incoming mi(t) to fill a reservoir Ri using algorithm 1  (paragraph 58), wherein mi(t) = number of tuples to be seen from Si, starting from time in point (t) (paragraph 65).
Algorithm 1:  for each tuple arriving from places all tuples from an input stream, add the tuple to the reservoir until the reservoir becomes full (paragraph 6);
“a final sampling reservoir established within said computer memory” as establishing in a memory respective sampling reservoirs of a size e.g., (R.sub.1, R.sub.2, . . . , Rv) of v input streams (S.sub.1, S.sub.2, . . . , Sv) (paragraphs 15, 60, 63) and fill or add using a processor (paragraphs, 19, 44), each reservoir of sampling reservoirs  e.g., (R.sub.1, R.sub.2, . . . , Rv)  with a tuple received from one of streams (paragraphs 65-66: algorithm 3: steps 16-21; paragraph 6: algorithm 1). 	Reservoir e.g., sampling Rv of the same size is represented as a final sampling reservoir of equivalent size;
“said final sampling reservoir having said equivalent size as said respective intermediate sampling reservoirs” as establishing in a memory respective sampling  reservoirs of a size e.g., (R.sub.1, R.sub.2, . . . , Rv) of v input streams (S.sub.1, S.sub.2, . . . , Sv) (paragraphs 15, 60, 63). 
Sampling Rv is represented as final sampling reservoir. The size of any one or more of the plurality of sampling reservoirs (paragraph 15) indicates each sampling Rv of reservoirs (R.sub.1, R.sub.2, . . . , Rv)  has a same size as equivalent size.
AI does not explicitly teach the claimed limitation:
sample of data elements; the sample of data elements;
 said final sampling reservoir being populated by said computer processor with a random selection of data elements from each one of said respective intermediate sampling reservoirs.
Me teaches the claimed limitations: 
a sample of data elements; the sample of data elements (as receiving, by a processor of storage system 104, from each of multiple data streams e.g., files, segments as data elements and sampling one segment S for every R segments in the file by comparing subset of bits in a content based-segment fingerprint to sampling patent (fig. 1, 4-5, col. 9, lines 10-20; col. 10, lines 57-67). Each sampled segment of segments is represented as a sample of data elements);
“a computer processor for receiving a sample of data elements from each one of said multiple data streams” as receiving, by a processor of storage system 104, from group of files as multiple data streams e.g., files, segments as data elements and sampling one segment S for every R segments in the file by comparing subset of bits in a content based-segment fingerprint to sampling patent (fig. 1, 4-5, col. 9, lines 10-20; col. 10, lines 57-67). Each sampled segment of segments is represented as a sample of data elements or 
as receiving, by a processor of storage system 104, from each of multiple data streams e.g., files, segments as data elements and sampling one segment S for every R segments in the file by comparing subset of bits in a content based-segment fingerprint to sampling patent (fig. 1, 4-5, col. 9, lines 10-20; col. 10, lines 57-67). Each sampled segment of segments is represented as a sample of data elements;
 “a respective intermediate sampling reservoir established within a computer memory for each one of the multiple data streams, each one of said respective intermediate sampling reservoirs having an equivalent size and being populated by said 4Teradata Docket No. 15-1039 computer processor with the sample of data elements received from said one of the multiple data streams” as for each file of files as multiple data streams, establishing in a memory in a  bitmap of a uniform size as  equivalent size (fig. 7, col. 8, lines 35-67) and  storing using a processor with segment of segments received from each file of files  in bitmap of bitmaps e.g., inserting sampled segment S of a file into a bitmap (figs. 4-5, 9, col. 9, lines 10-20; col. 10, lines 57-67; col. 11, lines 1-6); or
for each file of files as multiple data streams, establishing in a memory a respective particular bit location (col. 3, lines 1-15, fig. 7)  in a  bitmap of a uniform size as  equivalent size (fig. 7, col. 8, lines 35-67) and  storing using a processor with segment of segments received from each file of files  in bitmap of bitmaps e.g., inserting sampled segment S of a file into a bitmap (figs. 4-5, 9, col. 9, lines 10-20; col. 10, lines 57-67; col. 11, lines 1-6);
“a final sampling reservoir established within said computer memory, said final sampling reservoir having said equivalent size as said respective intermediate sampling reservoirs” as generating cluster definition Fk as a final sampling reservoir in a memory, the Fk has a uniform size as bitmaps as intermediate sampling reservoirs (figs. 3-5, 9, col. 7, lines 60-67, col. 8, lines 1-10; col. 9, lines 10-67; col. 10, lines 60-67);
“generating a random sample of data elements from multiple data streams” as (paragraphs 2-3).
Al and Me disclose a method for randomly selecting a tuple in the reservoir.  These references are the same field with application’s endeavor. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply Me’s teaching to Al’s system in order to store files in clusters of a distributed system efficiently, to reduce representation space in reservoir, to reduce the number of segments selected for insertion into reservoirs and further to reduce greatly required network traffic or bandwidth and the processing resources.
Shi teaches the claimed limitation
 “said final sampling reservoir being populated by said computer processor with a random selection of data elements from each one of said respective intermediate sampling reservoirs” as randomly selecting rows or percentages of rows, by processing device that includes a processor (fig. 1, paragraph 22)  from tables (fig. 4) as intermediate sampling reservoirs and storing the percentages of rows or the selected rows in shadow database as said final sampling reservoir (figs. 4-7, paragraphs 40-43). 
In particularly, when a loop exists, tables that are included in the loop may have more than the target iteration slicing percentage of rows copied to corresponding tables in the shadow database. In some cases, all data from tables included in the loop may be copied to the shadow database in a single round. Making the target iteration slicing percentage smaller may avoid having all the data from the tables included in the loop copied to the shadow database in a single round (paragraph 39).
The processing device may then find a first driving table of the relational database by referring to the representation of the created connected graph (act 408). Approximately, the target iteration slicing percentage of rows may be randomly selected from the driving table and stored in a corresponding table in the shadow database (act 410). The process previously described, with respect to FIG. 7, or another process, may be executed by the processing device to randomly selected rows from the driving table and store the randomly selected rows in the corresponding table in the shadow database (paragraph 40). 
If, during act 414 (FIG. 4), the processing device determines that a next related table is found, then the processing device may select rows of tables of the relational database, related to the selected and copied rows of the driving table, either directly or indirectly, and may copy the selected rows of the tables to corresponding tables of the shadow database (act 602; FIG. 6) (paragraph 43);
“a final sampling reservoir established within said computer memory” as (paragraphs 22, 28).
Al and Shi disclose a method for randomly selecting data and storing the selected data into a storage.  These references are the same field with application’s endeavor. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply Shi’s teaching to Al’s system in order to copy rows of the relational database to a shadow database, while preserving referential integrity among tables of the created shadow database and further to avoid all data from tables connected in a loop may be copied to the shadow database in a single iteration.

As to claim 15, AI teaches system for generating a random sample of data elements from multiple data streams (abstract, paragraph 87), comprising:
“ a computer processor for receiving a stream of data elements from a first data stream” as receiving, using a computer processor (paragraph 17), from each stream of multiple streams, tuples (paragraphs 15, 65).  A tuple of tuples is not a sample of data elements.  A first stream of the multiple streams is represented as a first data stream;
“a first sampling reservoir established within a computer memory and populated exclusively with a sample of data elements received from said first data stream; said computer processor receiving a stream of data elements from a second data stream; a second sampling reservoir established within said computer memory and populated exclusively with a sample of data elements received from said second data stream” as for each stream of multiple streams, establishing in a memory respective sampling reservoir of the sampling rversoirs e.g., (R.sub.1, R.sub.2, . . . , Rv) of v input streams (S.sub.1, S.sub.2, . . . , Sv) (paragraphs 15, 60, 63) and filling or adding using a processor (paragraphs, 19, 44), each sampling reservoir of sampling reservoirs e.g., (R.sub.1, R.sub.2, . . . , Rv)  with a tuple of the tuples received from one of the streams (paragraphs 65-66: algorithm 3: steps 16-21; paragraph 6: algorithm 1).
Sampling R.sub.1 is represented as a first sampling reservoir, Sampling R.sub.2 is represented as a second sampling reservoir. A first stream of the multiple streams is represented as a first data stream.  A second stream of the multiple streams is represented as a second data stream. A tuple of the tuples is not a sample of data elements.
In particularly:  Algorithm 3: Inputs streams; At step 16: for each Ri belong (Lreduced = set of all Ri +  Lenlarged =set of all Ri) do:
Select tuples from the incoming mi(t) to fill a reservoir Ri using algorithm 1  (paragraph 58), wherein mi(t) = number of tuples to be seen from Si, starting from time in point (t) (paragraph 65).
Algorithm 1:  for each tuple arriving from places all tuples from an input stream, add the tuple to  the reservoir until the reservoir  becomes full (paragraph 6);
“a third sampling reservoir established with said computer memory” as establishing in a memory respective sampling reservoirs e.g., (R.sub.1, R.sub.2, . . . , Rv) of v input streams (S.sub.1, S.sub.2, . . . , Sv) (paragraphs 15, 60, 63).  Sampling Rv is represented as a third sampling reservoir.
AI does not explicitly teach the claimed limitations:
sample of data elements;
 populated with a random selection of data elements from said first and second sampling reservoirs.
Me teaches the claimed limitations:
a sample of data elements (as receiving, by a processor of storage system 104, from each of multiple data streams e.g., files, segments as data elements and sampling one segment S for every R segments in the file by comparing subset of bits in a content based-segment fingerprint to sampling patent (fig. 1, 4-5, col. 9, lines 10-20; col. 10, lines 57-67). Each sampled segment of segments is represented as a sample of data elements);
“a computer processor for receiving a stream of data elements from a first data stream; a first sampling reservoir established within a computer memory and populated exclusively with a sample of data elements received from said first data stream; said computer processor receiving a stream of data elements from a second data stream; a second sampling reservoir established within said computer memory and populated exclusively with a sample of data elements received from said second data stream” as receiving, by a processor of storage system 104, from each stream of streams, e.g., files as data elements and sampling one segment S for every R segments in the file by comparing subset of bits in a content based-segment fingerprint to sampling patent (fig. 1, 4-5, col. 9, lines 10-20; col. 10, lines 57-67). Each sampled segment of segments is represented as a sample of data elements.  For each file of files as multiple data sources, establishing in a memory a bitmap of a uniform size as  equivalent size (col. 8, lines 35-67) and  storing completely as exclusively using a processor with each segment of segments received from each file of files  in each respective particular bit location of locations (col. 3, lines 1-15, fig. 7) in a bitmap of bitmaps e.g., inserting sampled segment S of a file into a location of a bitmap (figs. 4-5, 9, col. 9, lines 10-20; col. 10, lines 57-67; col. 11, lines 1-6).
For example, bitmap B.sub.F is stored as a bloom filter with a single hash function that uniformly distributes the set S of ones and zeroes across the bitmap vector B. The bitmap vector B is initially a set of all zeroes.  A unique segment s belonging to the file F is mapped to a particular bit location, i.e., the offset, in the bitmap vector B using an algorithm to insert(s, S) using the bloom filter with the single hash function and bitmap B: function insert(segment s) B[hash(s)]=1 The bit value at the B[hash(s)] location in the bitmap vector B is flipped from zero to one in order to represent that an element was added to the set S of ones and zeroes, i.e. that the unique segment s is mapped to that bit's offset in bitmap vector B and, therefore, belongs to file F (col. 3, lines 5-22, fig. 7);5Teradata Docket No. 15-1039
“a third sampling reservoir established with said computer memory” as generating cluster definition Fk as a final sampling reservoir of a uniform size (figs. 3-5, 9, col. 7, lines 60-67, col. 8, lines 1-10).
Al and Me disclose a method for randomly selecting a tuple in the reservoir.  These references are the same field with application’s endeavor. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply Me’s teaching to Al’s system in order to store files in clusters of a distributed system efficiently,  to reduce representation space in reservoir, to reduce the number of segments selected for insertion into reservoirs and further to reduce greatly required network traffic or bandwidth and the processing resources.
 Shi  teaches the claimed limitation:
 “a third sampling reservoir established with said computer memory and populated with a random selection of data elements from said first and second sampling reservoirs” as a shadow database created with computer memory (fig. 1, 3, paragraphs 22, 28) and randomly selecting rows or percentages of rows, by processing device that includes a processor (fig. 1, paragraph 22)  from tables (fig. 4) as intermediate sampling reservoirs and storing the percentages of rows or the selected rows in shadow database as said final sampling reservoir (figs. 4-7, paragraphs 40-43). 
In particularly, when a loop exists, tables that are included in the loop may have more than the target iteration slicing percentage of rows copied to corresponding tables in the shadow database. In some cases, all data from tables included in the loop may be copied to the shadow database in a single round. Making the target iteration slicing percentage smaller may avoid having all the data from the tables included in the loop copied to the shadow database in a single round (paragraph 39).
The processing device may then find a first driving table of the relational database by referring to the representation of the created connected graph (act 408). Approximately, the target iteration slicing percentage of rows may be randomly selected from the driving table and stored in a corresponding table in the shadow database (act 410). The process previously described, with respect to FIG. 7, or another process, may be executed by the processing device to randomly selected rows from the driving table and store the randomly selected rows in the corresponding table in the shadow database (paragraph 40). 
If, during act 414 (FIG. 4), the processing device determines that a next related table is found, then the processing device may select rows of tables of the relational database, related to the selected and copied rows of the driving table, either directly or indirectly, and may copy the selected rows of the tables to corresponding tables of the shadow database (act 602; FIG. 6) (paragraph 43).
Al and Shi disclose a method for randomly selecting data and storing the selected data into a storage.  These references are the same field with application’s endeavor. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply Shi’s teaching to Al’s system in order to copy rows of the relational database to a shadow database, while preserving referential integrity among tables of the created shadow database and further to avoid all data from tables connected in a loop may be copied to the shadow database in a single iteration.

Claims 5, 12 are rejected under 35 U.S.C. 103 as being unpatentable over 
AI in view of Me and Shi and further in view of Soundararajan et al (or hereinafter “Sound”) (US 20130132967).
	As to claims 5, 12, AI does not explicitly teach the claimed limitation “wherein said distributed data processing system comprises a MapReduce system”. Sound teaches distributed data processing system comprises a MapReduce system (paragraphs 3, 11).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention Sound’s teaching to AI’s system in order to allow a system operator to more easily manage the storage space associated with the system as well as add storage capacity or features using industry standard solutions and further to reduce estimation of resource usage for each query further specifies a predicted execution time, including map time and reduce time, for such query. 

Claims 5, 12 are rejected under 35 U.S.C. 103 as being unpatentable over AI in view of Me and Shi and further in view of Dagli (US 20150356149).
	As to claims 5, 12, AI does not explicitly teach the claimed limitation “wherein said distributed data processing system comprises a MapReduce system”. Dagli teaches using the MapReduce framework to randomly assign the input data records from the plurality of data sources into the finalized number of base model partitions (abstract, paragraphs 63, 67, fig. 7).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention Dagli’s teaching to AI’s system in order to allow a system operator to more easily manage the storage space associated with the system as well as add storage capacity or features using industry standard solutions and further to reduce estimation of resource usage for each query further specifies a predicted execution time, including map time and reduce time, for such query. 

Claims 7, 14, 16  are rejected under 35 U.S.C. 103 as being unpatentable AI in view of Me and Shi and further in view of Yoshinari et al (or hereinafter “Yoshi”) (US 20100318756).
	As to claims 7, 14, 16, AI does not explicitly teach the claimed limitations:
“wherein: said multiple data streams provide data elements at different rates; and said step of randomly selecting data elements from each one of said respective intermediate sampling reservoirs to populate said final sampling reservoir employs 13Teradata Corporation Docket No. 15-1039 probabilistic techniques to weight said selection of data elements from said multiple data streams according to said different rates”; “wherein: said multiple data streams provide data elements at different rates; and data elements are selected from each one of said intermediate sampling reservoirs to populate said final sampling reservoir using probabilistic techniques to weight said selection of data elements from said multiple data streams according to said different rates” and “wherein: said multiple data streams provide data elements at different rates; and data elements are selected from said first and second sampling reservoirs to populate said third sampling reservoir using a probabilistic technique to weight said selection of data elements from said first and second sampling reservoirs according to said different rates”. 
Shi teaches randomly selecting rows or percentages of rows, by processing device that includes a processor (fig. 1, paragraph 22)  from tables (fig. 4) as intermediate sampling reservoirs and storing the percentages of rows or the selected rows in shadow database as said final sampling reservoir (figs. 4-7, paragraphs 40-43).  Yoshi teaches groups provide data at different migrate rates and selecting data from a volume to populate another volume employs percentage methods to weight selection of data  from group based on migrate rates (paragraphs 131-137).   It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply Shi’s teaching and Yoshi’s teaching to AI’s system in order to  reduce time populating data into another storage, to prevent network traffic, to  provide a fast writing data method from one storage to another storage efficiently, to allow a system operator to more easily manage the storage space associated with the system as well as add storage capacity or features using industry standard solutions and further to reduce estimation of resource usage for each query further specifies a predicted execution time, including map time and reduce time, for such query. 

Claims 7, 14, 16  are rejected under 35 U.S.C. 103 as being unpatentable over AI in view of Me and Shi and further in view of Shao et al (or hereinafter “Shao” (US 20110184777).
	As to claims 7, 14, 16, AI does not explicitly teach the claimed limitations:
	 “wherein: said multiple data streams provide data elements at different rates; and said step of randomly selecting data elements from each one of said respective intermediate sampling reservoirs to populate said final sampling reservoir employs 13Teradata Corporation Docket No. 15-1039 probabilistic techniques to weight said selection of data elements from said multiple data streams according to said different rates”; “wherein: said multiple data streams provide data elements at different rates; and data elements are selected from each one of said intermediate sampling reservoirs to populate said final sampling reservoir using probabilistic techniques to weight said selection of data elements from said multiple data streams according to said different rates” and “wherein: said multiple data streams provide data elements at different rates; and data elements are selected from said first and second sampling reservoirs to populate said third sampling reservoir using a probabilistic technique to weight said selection of data elements from said first and second sampling reservoirs according to said different rates”. 
	Shi teaches randomly selecting rows or percentages of rows, by processing device that includes a processor (fig. 1, paragraph 22)  from tables (fig. 4) as intermediate sampling reservoirs and storing the percentages of rows or the selected rows in shadow database as said final sampling reservoir (figs. 4-7, paragraphs 40-43).  Shao teaches streams provide counts at different migrate rates and selecting account from a bucket to populate another bucket employs percentage methods to weight selection of accounts  from stream based on migrate rages (paragraphs 60-66).   It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply Shi’s teaching and Shao’s teaching to AI’s system in order to reduce time populating data into another storage, to prevent network traffic, to  provide a fast writing data method from one storage to another storage efficiently, to allow a system operator to more easily manage the storage space associated with the system as well as add storage capacity or features using industry standard solutions and further to reduce estimation of resource usage for each query further specifies a predicted execution time, including map time and reduce time, for such query. 













Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CAM-Y T TRUONG whose telephone number is (571)272-4042.  The examiner can normally be reached on (571) 272 4042.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on (571) 272 4046.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CAM Y T TRUONG/          Primary Examiner, Art Unit 2169