DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
2.	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on February 23, 2021 has been entered.
Response to Amendment
3.	This Office Action is issued in response to Applicant’s amendment filed on February 23, 2021, in which claims 1-8, 10-16, 18-19, and 21-23 are presented for examination. 
4.	Claims 1-8, 10-16, 18-19, and 21-23 are pending in this application, of which claims 1, 15, and 18 are in independent form. 
5.	Claims 1, 15, and 18 are amended.
6.	Claims 9, 17, and 20 are cancelled by the applicant.7.	Examiner noted that the last office action did not include any rejection of dependent claim 11. The instant Non-Final Office Action includes rejection to dependent claim 11.  
Information Disclosure Statement
8.	The information disclosure statement (IDS) submitted on February 08, 2021 and February 23, 2021.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Arguments
9.	Applicant's arguments filed on February 23, 2021 have been fully considered but they are not persuasive.
10. 	Applicant argue, “Applicant therefore submits that Harnik fails to disclose at least the newly-added feature of claim 1”. The newly added feature reads as follow: “wherein the designated scan criterion establishes a sampling ratio of the pages as part of the scan”.  
 	Examiner respectfully disagrees. Harnik as in [0009] describes a method for estimating data reduction ratio for a data set of elements, which consists of two phases, the sampling phase estimates a compression ration [0023], while the scanning phase computes a deduplication ration [0024]. The efficiency of applying the method to a target data set is calculated, prior to performing data reduction on the entire data set [0020].
In Harnik’s sampling phase, a data set S is sampled to select M elements out of a total of N elements to create a base sample B. The sampling may be performed randomly or according to a sampling algorithm (i.e., scan criterion) [0021]. In a sampling phase, an identifier is generated for each sampled element, e.g., by applying a hash algorithm [0021]. The identifier is the hash signature of the element [0039]. Harnik’s sampling phase calculates a value count Base_i indicating the number of elements in B with the same identifier [0022]. The identifiers and the associated value counts are stored in a table. The compression rate for each element in base sample B is calculated after the sampling phase [0023], which may be used to estimate the compression ratio achievable in the entire data set [0031]. Harnik samples M elements from data set S containing N elements to form base sample B [0021]. After the sampling phase, Harnik establishes a sampling ration of size(B)/size(S) = M/N. Therefore, applicant’s arguments to the contrary are not persuasive. Thus, applicant’s argument that Harnik does not teach the newly added feature is not persuasive, and the reference is maintained.


Claim Rejections - 35 USC § 103
10.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.

11.	Claims 1-3, 6, 13-15, 18, and 21-23 are rejected under 35 U.S.C. 103 as being un-patentable over Harnik et al. U.S. 2014/0052699 A1 (hereinafter Harnik) in view of Oberbreckling et al. U.S. 2018/0075104 A1 (hereinafter Oberbreckling) Margalit et al. “Estimation of Deduplication Ratios in Large Data Sets” (hereinafter Margalit). 

Regarding claim 1. An apparatus comprising: 
at least one processing device comprising a processor (Harnik [0069] and [Figure 4A, element 1101] e.g., “a processor”) coupled to a memory (Harnik [0069] and [Figure 4A, element 1102] e.g., “…comprise local memory 1102”); 
the processing device being configured:  	to identify at least first and second datasets to scanned to generate a data reduction estimate (Harnik [0009] describes that the implemented method provide estimating data reduction (i.e., data reduction estimate) for a prospective combination of the first and second datasets (Harnik [Abstract] and [0009] where the implemented method selecting a plurality of data elements from a data set. See also [0021] where a certain sample of data identified for comparison and calculating, e.g. found in [0009], “…calculating a value count.sub.i that indicates the number of times an identifier h.sub.e matches an identifier h.sub.i; and estimating data reduction ratio for the plurality of N elements in the data set, based on number of m number elements selected from the data set and the value count.sub.i.”); 
to designate a scan criterion to be utilized in the scan of each of the datasets (Harnik [0021] where an instance of a dataset sampled, e.g., “…a data set (S) is sampled to select M elements out of a total of N elements in S to create a base sample (B) (S210). The sampling may be performed randomly or according to a sampling algorithm (i.e., scan criterion);
for each of a plurality of pages of each of the datasets, to scan the page by: 
performing a computation on the page to obtain a page result (Harnik [0024]  describe that once a dataset scanned, a duplicate element in the sampled elements will be calculated to reflect the outcome (i.e., result), e.g., “once the scanning of the data set S is completed, a counter for each element in B is calculated that reflects the number of times a particular element in B appears in S“. In Harnik calculation is performed (i.e., computed) for each sampled element (i.e., page), e.g., by applying a hash algorithm ([0021]). The identifier is analogous to the hash signature of the element (i.e., page result) ([0039])); 
determining whether or not the page result satisfies the designated scan criterion (Harnik [0021]-[0022] during the sampling phase a data set identified that satisfy the scanning criterion, e.g., “…a data set (S) is sampled to select M elements out of a total of N elements in S to create a base sample (B) (i.e., satisfying the designated scan criterion))”; and 
responsive to the page result satisfying the designated scan criterion, updating a corresponding entry of a data reduction estimate table for the dataset (Harnik [0006] where the implemented a data reduction technique applied to the entire data set and determine a data reduction rate (i.e., estimate) the result stored in a storage, see also [0021]-[0022] and [0062] where the identifier implemented to select sample elements stored in a table, e.g., “The identifier is stored in a data structure (e.g., a hash table)”); 	determining if there is a related entry having a same page identifier (Harnik [0039] e.g., “From the entire data set (S), choose m elements randomly where m is a parameter chosen in advance. For each element, calculate its hash value and add it to a set that we call the base sample (B)” where the Harnik system from the data set S, the system selects m elements randomly (i.e., designated subset inclusion characteristic) to be included in base sample B. To determine whether an element in S is also in B, its hash signature is compared to hash signature of elements) in another one of the data reduction estimate tables (Harnik [0022] e.g., “The identifier is stored in a data structure (e.g., a hash table). It is noteworthy that two or more elements out of the N elements in S may be duplicate copies and thus applying a uniform method to generate an identifier for one sampled element optionally would lead to generating the same identifier for another element, when the two elements are duplicates.”): and  	responsive to an affirmative determination, combining counter values of the related entries (Harnik [0040] e.g., “For each element e.epsilon.S its hash signature h.sub.e is computed. If this signature matches h.sub.i for some i.epsilon. B then count.sub.i is incremented by 1.”). 	wherein the designated scan criterion establishes a sampling ratio of the pages as part of the scan (Harnik [0021] wherein the implemented system samples M elements from data set S containing N elements to form base sample B [0021]. After the sampling phase, Harnik establishes a sampling ratio of size(B)/size(S) = M/N).
Harnik discloses where a hash value of each element of multiple dataset calculated and merged. But does not explicitly discuss first and second datasets; and 	wherein merging contents of the data reduction estimate tables comprises for each of a plurality of entries of at least one of the data reduction estimate tables. 	However, Oberbreckling discloses merge contents of the data reduction estimate tables for the respective first and second datasets (Oberbreckling [0012] describes various dataset type (i.e., first dataset and second dataset). See also [0141] describes the first and second datasets are different datasets, e.g., “The second dataset can be different from the first dataset”. The two data seta are merging, See [0012], e.g., “merging, according to the type of join, the first dataset at a first column within the first column pair with the second dataset at a second column in the first column pair “); and 	wherein merging contents of the data reduction estimate tables comprises for each of a plurality of entries of at least one of the data reduction estimate tables (Oberbreckling [0009] and [0114] where the implemented system merging two different datasets, e.g., “The system can provide an intuitive way to enable provide options for merging or joining data from different datasets”).   
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the techniques for relationship discovery between datasets of Oberbreckling with estimation of data reduction rate in a data storage system of Harnik. One having ordinary skill in the art would have found motivation in order to the new and improved techniques for comparing datasets are desired to enable quick and efficient processing of data for enrichment.
The combination of Harnik and Oberbreckling does not specifically disclose 
to generate the data reduction estimate for the prospective combination of the first and second datasets based at least in part on the merged contents of the data reduction estimate tables; and  	wherein a given one of the data reduction estimate tables for a particular one of the datasets comprises a plurality of entries for respective ones of the pages of that dataset. 	However, Margalit discloses to generate the data reduction estimate for the prospective combination of the first and second datasets based at least in part on the merged contents of the data reduction estimate tables (Margalit [Abstract] and [Section I. INTRODUCTION, a-c, & [A. The Challenges of Efficient Estimation] & [B. Our Results] Sample phase & Scan phase]. See also [section II. PRELIMINARIES & III. OURT ESTIMATION TECHINIQUE] wherein utilizing the sampling phase and the scanning phase a data reduction estimate is generated); and  	wherein a given one of the data reduction estimate tables for a particular one of the datasets comprises a plurality of entries for respective ones of the pages of that dataset (Margalit [Section III. Our Estimation Technique] where the implemented system utilize two phases, a sampling phase and a scanning phase to process data reduction estimate). 	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of estimation of deduplication ratios in large data sets of Margalit with the combined teachings of Harnik and Oberbreckling. One having ordinary skill in the art would have found motivation in order to the new and improved techniques can give accurate estimations on the change rate and in-volume compression, as a means for calculating the overall capacity required for backup mechanism.
Regarding claim 2, the rejection of claim 1 is hereby incorporated by reference, Harnik, Oberbreckling and Margalit, discloses an apparatus wherein the processing device is implemented in one of: a host device configured to communicate over a network with at least one storage system that stores at least one of the first and second datasets; and said at least one storage system that stores at least one of the first and second datasets (Harnik [Figure 1] and [0019] e.g., “…a multiprocessing networked environment in which computing system 110 is connected to one or more computing system(s) 120 and shared storage device 140 over network 130”). 

Regarding claim 3, the rejection of claim 1 is hereby incorporated by reference, Harnik, Oberbreckling and Margalit discloses an apparatus wherein the first and second datasets comprise respective sets of one or more logical storage volumes of at least one storage system (Harnik [Figure 1] and [0019] where the data reduction system connected with shared storages). 

Regarding claim 6, the rejection of claim 1 is hereby incorporated by reference, Harnik, Oberbreckling and Margalit, discloses an apparatus wherein updating a corresponding entry of the data reduction estimate table for a given one of the pages of a given one of the datasets comprises one of the following operations (i) and (ii): (i) responsive to a page identifier of the given page not already being present in the data reduction estimate table, inserting the page identifier into the data reduction estimate table and setting an associated counter to an initial value; and (ii) responsive to the page identifier already being present in the data reduction estimate table, incrementing its associated counter (Harnik [0021]-[0022], [0024], and [0039]-[0040], in Harnik an identifier (i.e., page identifier) is generated (i.e., computed) for each sampled element, e.g., by applying a hash algorithm ([0021]). The identifier is analogous to the hash signature of the element ([0039]). Harnik takes a two-phase approach. In the first sampling phase (i.e., step (i)) ([0021]), all sampled elements in the base sample B are inserted into a hash table (i.e., compression estimate table) ([0022]). In the second scanning phase (i.e., step (ii)) ([0024]), for each element in data set S, if its hash signature (i.e., identifier) h_e matches h_i for some element in the base sample B, then Count_i is incremented by 1. Otherwise it is not in the base sample B (i.e., not satisfying the scanning criterion), and can be ignored ([0040])). 

Regarding claim 13, the rejection of claim 1 is hereby incorporated by reference, Harnik, Oberbreckling and Margalit, discloses an apparatus wherein the processing device is configured to adjust one or more characteristics of a storage configuration of the first and second datasets based at least in part on the data reduction estimate generated for the prospective combination of the first and second datasets (Harnik [0025] and [0031]the data reduction rate (deduplication + compression rate) for elements in the base sample is calculated ([0025]), which may be used to determine the deduplication and compression ratio achievable in the entire data set). 

Regarding claim 14, the rejection of claim 1 is hereby incorporated by reference, Harnik, Oberbreckling and Margalit, discloses an apparatus wherein the processing device is configured: 
to generate one or more additional data reduction estimates for respective additional groups of two or more datasets (Harnik [Abstract] and [0009] where the implemented method selecting a plurality of data elements from a data set. See also [0021] where a certain sample of data identified for comparison and calculating, e.g., “…calculating a value count.sub.i that indicates the number of times an identifier h.sub.e matches an identifier h.sub.i; and estimating data reduction ratio for the plurality of N elements in the data set, based on number of m number elements selected from the data set and the value count.sub.i.”); and 
to select a particular one of the groups of two or more datasets for actual combination based at least in part on their respective data reduction estimates (Harnik [0009] e.g., “…a method for estimating data reduction ratio for a data set is provided. The method comprises selecting a plurality of m elements from a data set comprising a plurality of N elements”); and 
to combine the two or more datasets of the selected group (Oberbreckling [0009] e.g., “system can provide an intuitive way to enable provide options for merging or joining data from different datasets. Such techniques may be used to combine or join datasets identified as having a relationship”). 
Claims 15 and 18 amount to a method and a computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, performing the steps of claim 1. They are rejected for substantially the same reason as presented above for claim 1 and based on the references disclosure of the necessary supporting hardware and software.
	
Regarding claim 21, the rejection of claim 18 is hereby incorporated by reference, Harnik, Oberbreckling and Margalit, discloses a computer program product wherein the program code when executed by said at least one processing device further causes said at least one processing device processing device: 	to generate one or more additional data reduction estimates for respective additional groups of two or more datasets (In Harnik, the data reduction rate (deduplication + compression rate) for elements in the base sample is calculated ([0025]), which may be used to determine the deduplication and compression ratio achievable in the entire data set ([0031]); 	to select a particular one of the groups of two or more datasets for actual combination based at least in part on their respective data reduction estimates; and to combine the two or more datasets of the selected group (Harnik [0009] e.g., “…a method for estimating data reduction ratio for a data set is provided. The method comprises selecting a plurality of m elements from a data set comprising a plurality of N elements”); and 	to combine the two or more datasets of the selected group (Oberbreckling [0009] e.g., “system can provide an intuitive way to enable provide options for merging or joining data from different datasets. Such techniques may be used to combine or join datasets identified as having a relationship”).
Regarding claim 22, the rejection of claim 15 is hereby incorporated by reference, Harnik, Oberbreckling and Margalit, discloses a method wherein updating a corresponding entry of the data reduction estimate table for a given one of the pages of a given one of the datasets comprises one of the following operations (i) and (ii): 	(i)    responsive to a page identifier of the given page not already being present in the data reduction estimate table, inserting the page identifier into the data reduction estimate table and setting an associated counter to an initial value (In Harnik, for each of the M elements in the base sample B (i.e., satisfying the designated scan criterion), calculating two value counts: Base_i indicates the number of elements in B with the same identifier ([0022]); while Count_i indicates the number of elements in data set S with the same identifier ([0024]). The identifiers and the associated value counts are stored in a table (i.e., compression estimation table).; and 	(ii)    responsive to the page identifier already being present in the data reduction estimate table, incrementing its associated counter (Harnik [0021]-[0022], [0024], and [0039]-[0040], in Harnik an identifier (i.e., page identifier) is generated (i.e., computed) for each sampled element, e.g., by applying a hash algorithm ([0021]). The identifier is analogous to the hash signature of the element ([0039]). Harnik takes a two-phase approach. In the first sampling phase (i.e., step (i)) ([0021]), all sampled elements in the base sample B are inserted into a hash table (i.e., compression estimate table) ([0022]). In the second scanning phase (i.e., step (ii)) ([0024]), for each element in data set S, if its hash signature (i.e., identifier) h_e matches h_i for some element in the base sample B, then Count_i is incremented by 1. Otherwise it is not in the base sample B (i.e., not satisfying the scanning criterion), and can be ignored ([0040])).
Regarding claim 23, the rejection of claim 15 is hereby incorporated by reference, Harnik, Oberbreckling and Margalit, discloses a method further comprising: 	generating one or more additional data reduction estimates for respective additional groups of two or more datasets (In Harnik, the data reduction rate (deduplication + compression rate) for elements in the base sample is calculated ([0025]), which may be used to determine the deduplication and compression ratio achievable in the entire data set ([0031]); 	selecting a particular one of the groups of two or more datasets for actual combination based at least in part on their respective data reduction estimates (Harnik [0009] e.g., “…a method for estimating data reduction ratio for a data set is provided. The method comprises selecting a plurality of m elements from a data set comprising a plurality of N elements”); and 	combining the two or more datasets of the selected group Oberbreckling [0009] e.g., “system can provide an intuitive way to enable provide options for merging or joining data from different datasets. Such techniques may be used to combine or join datasets identified as having a relationship”).

12.	Claims 4, 7-8, 10, 16, and 19 are rejected under 35 U.S.C. 103 as being un-patentable over Harnik et al. U.S. 2014/0052699A1 (hereinafter Harnik) in view of Oberbreckling et al. U.S. 2018/0075104 A1 (hereinafter Oberbreckling) Margalit et al. “Estimation of Deduplication Ratios in Large Data Sets” (hereinafter Margalit) as applied to claims 1-3, 6, 13-15, 18, and 21-23 above, and further in view of Zhou, et al. “Counting YouTube Videos via Random Prefix Sampling”. 

Regarding claim 4, the rejection of claim 1 is hereby incorporated by reference, Harnik, Oberbreckling and Margalit discloses an apparatus wherein the designated scan criterion comprises a designated content-based signature prefix and scanning the page comprises: 
computing a content-based signature for the page (Harnik [0040] e.g., “For each element e.epsilon.S its hash signature h.sub.e is computed”); 
The combination of Harnik, Oberbreckling and Margalit does not clearly disclose comparing an initial portion of the content-based signature to the designated content-based signature prefix. 	However, Zhou teaches comparing an initial portion of the content-based signature to the designated content-based signature prefix (Zhou [page 378, column 1, 6th paragraph] e.g., “…they compare Random Prefix Sampling with other biased methods that again use BFS”); and 
responsive to a match between the initial portion and the designated content-based signature prefix, updating a corresponding entry of a data reduction estimate table for the dataset (Zhou [page 371, column 2, 2nd paragraph] and [page 373, column 2] Zhou randomly generates m prefixes with length L and query YouTube for videos whose id contains one of the prefixes). 
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching comparing random prefix sampling of Zhou with Harnik, Oberbreckling and Margalit. One having ordinary skill in the art would have found motivation to provide unbiased estimation of total number of YouTube videos, and total view counts, which discloses a high inherent bias in the results obtained by existing biased sampling methods.
Regarding claim 7, the rejection of claim 1 is hereby incorporated by reference, Harnik, Oberbreckling, Margalit, and Zhou, discloses an apparatus wherein the corresponding entry is configured to include a page identifier and further wherein the page identifier comprises a specified number of initial bytes of a content-based signature of that page (Zhou [page 371, column 2, 2nd paragraph] and [page 373, column 2] Zhou randomly generates m prefixes with length L and query YouTube for videos whose id contains one of the prefixes) Zhou randomly generates m prefixes with length L and query YouTube for videos whose id contains one of the prefixes).
Regarding claim 8, the rejection of claim 1 is hereby incorporated by reference, Harnik, Oberbreckling, Margalit, and Zhou, discloses an apparatus wherein each of one or more the entries is configured to include a page identifier that comprises less than an entire content-based signature of its corresponding page (Zhou [page 371, column 2, 2nd paragraph] and [page 373, column 2] Zhou randomly generates m prefixes with length L and query YouTube for videos whose id contains one of the prefixes) (Zhou [page 378, column 1, 6th paragraph] e.g., “…they compare Random Prefix Sampling with other biased methods that again use BFS”. Zhou randomly generates m prefixes with length L and query YouTube for videos whose id contains one of the prefixes).

Regarding claim 10, the rejection of claim 4 is hereby incorporated by reference, Harnik, Oberbreckling, Margalit, and Zhou, discloses an apparatus wherein the designated content-based signature prefix comprises a specified number of initial content-based signature bytes with the initial bytes each having a designated value (Zhou [page 371, column 2, 2nd paragraph] and [page 373, column 2] Zhou randomly generates m prefixes with length L and query YouTube for videos whose id contains one of the prefixes).. 

Regarding claim 16, the rejection of claim 15 is hereby incorporated by reference, Harnik, Oberbreckling, Margalit, and Zhou discloses a method wherein each of one or more the entries is configured to include a page identifier that comprises less than an entire content-based signature of its corresponding page (Zhou [page 371, column 2, 2nd paragraph] and [page 373, column 2] Zhou randomly generates m prefixes with length L and query YouTube for videos whose id contains one of the prefixes) (Zhou [page 378, column 1, 6th paragraph] e.g., “…they compare Random Prefix Sampling with other biased methods that again use BFS”. Zhou randomly generates m prefixes with length L and query YouTube for videos whose id contains one of the prefixes). 

Regarding claim 19, the rejection of claim 18 is hereby incorporated by reference, Harnik, Oberbreckling, Margalit, and Zhou discloses a computer program product wherein each of one or more the entries is configured to include a page identifier that comprises less than an entire content-based signature of its corresponding page (Zhou [page 371, column 2, 2nd paragraph] and [page 373, column 2] Zhou randomly generates m prefixes with length L and query YouTube for videos whose id contains one of the prefixes) (Zhou [page 378, column 1, 6th paragraph] e.g., “…they compare Random Prefix Sampling with other biased methods that again use BFS”. Zhou randomly generates m prefixes with length L and query YouTube for videos whose id contains one of the prefixes). 
 

13.	Claims 5, 11, and 12 are rejected under 35 U.S.C. 103 as being un-patentable over Harnik et al. U.S. 2014/0052699A1 (hereinafter Harnik) in view of Oberbreckling et al. U.S. 2018/0075104 A1 (hereinafter Oberbreckling) Margalit et al. “Estimation of Deduplication Ratios in Large Data Sets” (hereinafter Margalit) as applied to claims 1-3, 6, 13-15, 18, and 21-23 above, and further in view of Kucherov et al. US 2019/0370355 A1 (hereinafter Kucherov).

Regarding claim 5, the rejection of claim 1 is hereby incorporated by reference, Harnik, Oberbreckling; Margalit discloses an apparatus wherein the designated scan criterion comprises a designated subset inclusion characteristic and scanning the page comprises: 
wherein the designated scan criterion comprises a designated subset inclusion characteristic and scanning the page comprises: computing a polynomial-based signature for the page; determining whether or not the polynomial-based signature satisfies the designated subset inclusion characteristic; and responsive to the polynomial-based signature satisfying the designated subset inclusion characteristic, computing a content-based signature for the page and updating a corresponding entry of a data reduction estimate table for the dataset based at least in part on the content-based signature. 	However, Kucherov discloses a computing a polynomial-based signature for the page (Kucherov [0004] and [0037] e.g., “ for each of a plurality of pages of the dataset, to scan the page by computing a polynomial-based signature for the page”) Harnik [0040] the Harnik system computed the hash signature, e.g., “For each element e.epsilon.S its hash signature h.sub.e is computed”).  	determining whether or not the polynomial-based signature satisfies the designated subset inclusion characteristic (Kucherov [0037] e.g., “Scanning the page illustratively comprises computing a polynomial-based signature for the page, determining whether or not the polynomial-based signature satisfies the designated subset inclusion characteristic”); and  	responsive to the polynomial-based signature satisfying the designated subset inclusion characteristic, computing a content-based signature for the page and updating a corresponding entry of a data reduction estimate table for the dataset based at least in part on the content-based signature (Kucherov [0037] e.g., “responsive to the polynomial-based signature satisfying the designated subset inclusion characteristic, computing a content-based signature for the page in the content-based signature computation module 114 and updating a corresponding entry of a deduplication estimate table for the dataset based at least in part on the content-based signature”)
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of processing device utilizing polynomial-based signature of Kucherov with the combination of Harnik, Oberbreckling, and Margalit. One having ordinary skill in the art would have found motivation in order reduce amounts of computational and memory resources to generate deduplication, thus improving deduplication decisions and size of a dataset.  	Regarding claim 11, the rejection of claim 5 is hereby incorporated by reference, Harnik, Oberbreckling, Margalit, and Kucherov discloses an apparatus wherein the designated subset inclusion characteristic specifies that application of a designated function to the polynomial-based signature produces a particular result (Kucherov [0041] e.g., “The designated subset inclusion characteristic illustratively specifies that application of a designated function to a polynomial-based signature computed for a given page of the dataset produces a particular result”).

Regarding claim 12, the rejection of claim 5 is hereby incorporated by reference, Harnik, Oberbreckling, Margalit, and Kucherov discloses an apparatus wherein the polynomial-based signature comprises an n-bit cyclic redundancy check (CRC) value (Kucherov [0009] e.g., “In some embodiments, the polynomial-based signature comprises an n-bit cyclic redundancy check (CRC) value, such as a 32-bit CRC value”). 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BERHANU MITIKU whose telephone number is (571)270-1983. The examiner can normally be reached Monday – Friday 8:30 am – 4:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tamara T Kyle can be reached on 571-272-4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/BERHANU MITIKU/Examiner, Art Unit 2156                                                                                                                                                                                                        
/TAMARA T KYLE/Supervisory Patent Examiner, Art Unit 2156