Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Response to Amendment
This action is responsive to remarks and amendment filed on 7/29/2021.
Rejections and/or objections not reiterated from previous office actions are hereby withdrawn.
Claims 1-20 are pending in this Office Action. Claims 1 and 17-18 are independent claims.

Remarks
The claims and only the claims form the metes and bounds of the invention will be addressed.  “Office personnel are to give claims their broadest reasonable interpretation in light of the supporting disclosure. In re Morris, 127 F.3d 1048, 1054-55, 44 USPQ2d 1023, 1027-28 (Fed. Cir. 1997).  Limitations appearing in the specification but not recited in the claim are not read into the claim.  In re Prater, 415 F.2d 1393, 1404-05, 162 USPQ 541, 550-551 (CCPA 1969)” (MPEP p 2100-8, c 2, I 45-48; p 2100-9, c 1, l 1-4).  The Examiner has full latitude to interpret each claim in the broadest reasonable interpretation in light of the specification.  See MPEP 2111 [R-1].  The Examiner will reference prior art using terminology familiar to one of ordinary skill in the art.  Such an approach is broad in concept and can be either explicit or implicit in meaning.

Response to Arguments
Applicant's arguments with respect to claims 1-20 have been fully considered but they are not persuasive.
Regarding the amended claims 1 and 17-18, the applicant added new limitations and argued that the prior art does not teach the amended claim.
In response to the amendment and the argument, the examiner respectfully submits that Harnik in view of YOSHII and De Landstheer explicitly teaches the features as the amended claims, 1, and 17-18 per the rejection under 103(a).  Please see the map below.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.

3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-13 and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Harnik et al. (US Pub. No. 2017/0199895 A1), hereinafter “Harnik” in view of YOSHII et al. (US Pub. No. 2018/0039423 A1), hereinafter “YOSHII” and De Landstheer et al. (US Patent No 7,827,146 B1), hereinafter “De Landstheer”.
Regarding claim 1, Harnik teaches a method of processing data comprising: 
receiving a plurality of data chunks for a data set (Harnik, See [0041], FIG. 2 is a flow diagram that schematically illustrates a method for estimating a deduplication ratio in dataset 56, in accordance with an embodiment of the present invention. In a definition step 60, a chunk size, a super-chunk size and a sampling ratio are defined, and in an partition step 62, 24 partitions dataset 56 into a number (also referred to herein as a first number of chunks 34); 
performing data deduplication processing for the plurality of data chunks (Harnik, See [0041], FIG. 2 is a flow diagram that schematically illustrates a method for estimating a deduplication ratio in dataset 56, in accordance with an embodiment of the present invention), wherein said data deduplication processing includes: 
	determining, using a current hash algorithm, a plurality of digests for the plurality of data chunks of the data set (Harnik, See [0049], In embodiments of the present invention, a dataset is comprises collection of items. In reality, the data is a stream of bytes, that for the purposes of deduplication is broken into data chunks 34 (this could be fixed or variable sized chunks, e.g. of size 4K) and a given hash value 44 (i.e., a digital fingerprint) is computed for each chunk 34. In some embodiments, the collection of these fingerprints is considered to be the items in the dataset, where duplication of two items means that the corresponding chunks had identical fingerprints. Each of the items may also hold a compression ratio (or estimated compression ratio) for the corresponding data chunk); and 
	updating a frequency histogram for the data set in accordance the plurality of digests (Harnik, See [0045], In a second computation step 70, processor 24 uses observed hash value duplication histogram 38 to compute observed duplication frequency histogram 40. Duplication counts 48 and number of observations 50 in observed frequency histogram 40 comprises a histogram of how many chunks 34 are duplicated one time, how many chunks 34 are duplicated two times, how many chunks 34 are duplicated three times etc.) and does not explicitly disclose determining, in accordance with one or more criteria, whether a frequency distribution of the frequency histogram is sufficiently uniform, wherein the frequency histogram is a digest byte value histogram that tracks frequencies or counter values of different observed byte values for each individual byte position in digests computed for chunks of the data set; and responsive to determining that the frequency distribution of the frequency histogram is not sufficiently uniform, performing processing to update data deduplication settings for the data set.
However, YOSHII teaches determining, in accordance with one or more criteria, whether a frequency distribution of the frequency histogram is sufficiently uniform (YOSHII, See [0103], The duplicated length range 271 is the range for aggregating the duplicated lengths in a frequency distribution, and is generally called "order of frequency distribution". The duplicated length can be determined from the result of execution of the deduplication process. In the drawing, the value of the duplicated length range 271 increases exponentially, but may increase linearly. Note that the "[0 KB, 4 KB)" as the value of the duplicated length range 271 means the range of 0 KB or more and less than 4 KB), wherein the frequency histogram is a digest byte value histogram that tracks frequencies or counter values of different observed byte values for each individual byte position in digests computed for chunks of the data set (YOSHII, See [0056]-[0057], The deduplication process is performed between the logical address space 31 and the physical address space 32. The logical address space 31 is managed by being divided in units of regions called "chunks". The size of chunks 33 may be a fixed length or a variable length. Data having a chunk size is sometimes referred to as "chunk data". Data to be divided into a plurality of chunk data (in other words, a set of a plurality of chunk data) is sometimes referred to as "data set". The data set may be one or more data blocks or a part thereof. The "data block" is data accompanying an I/O command.  For example, the deduplication process may be performed in units of chunks 33. In order to detect calculation using a hash function is performed for each chunk data to calculate a representative value such as a hash value, and each of chunk data other than one chunk data among a plurality of chunk data having the same representative value is specified as "duplicated chunk data". Thus, the duplicated chunk data can be deleted. The representative value of chunk data is referred to as "fingerprint" in Embodiment 1. In the drawing, an alphabet illustrated in a chunk 33 represents a fingerprint 34 of the chunk 33. In this case, among the chunks 33, a chunk satisfying particular conditions is referred to as "characteristic chunk", and chunk data corresponding to the characteristic chunk is referred to as "characteristic chunk data". A fingerprint of the characteristic chunk data is referred to as "characteristic fingerprint" in order to distinguish from normal fingerprints 34. In the drawing, fingerprints of characteristic chunk data are underlined in order to distinguish between the characteristic chunks 35 and the other chunks 33); and 
responsive to determining that the frequency distribution of the frequency histogram is not sufficiently uniform, performing processing to update data deduplication settings for the data set (YOSHII, See [0104], The reduction amount 272 is a total value (cumulative reduction amount) of P individual reduction amounts corresponding to P duplicated data. The individual reduction amount is the difference between the data length before deduplication and the data length of duplicated data. The predicted reduction amount 273 is a predicted value calculated on the basis of the reduction amount 272 and the changed sampling period (predicted value of reduction amount in duplicated length range). In FIG. 9, the predicted reduction amount 273 is a predicted value obtained when the sampling period is changed from "4" to "8". The predicted maximum reduction amount 274 is a predicted value of reduction effect obtained when a sampling period ("1" in this embodiment) is selected to obtain the maximum reduction effect. The predicted maximum reduction amount 274 can be calculated by the same method as that for the predicted reduction amount 273). 
Hence, it would have been obvious to one of ordinary skill before the effective filling date of the claimed invention to combine Harnik and YOSHII because YOSHII provides  storage system is designed to: divide data into a plurality of chunk data (pieces of data) in a deduplication process; select one or more chunk data from among the plurality of chunk data in accordance with a sampling period which indicates that, on average, one chunk data be selected from among each N chunk data; and calculate a fingerprint, such as a hash value, for each of one or more characteristic chunk data, which are the selected one or more chunk data, and determine whether data including the one or more characteristic chunk data is a duplication. The storage system changes the sampling period on the basis of the results of past deduplication processes (YOSHII, See ABSTRACT) can be utilized by Harnik to effectively de-duplicate data set.
Furthermore, De Landstheer teaches wherein the frequency histogram is a digest byte value histogram that tracks frequencies or counter values of different observed byte values for each individual byte position in digests computed for chunks of the data set (De Landstheer, See Page 5, lines 21-48,  The fingerprint determined by the agent uniquely identifies the file or file segment. Thus no two non-identical files or segments can have the same fingerprint, and identical files or segments always have the same fingerprint. In the present example, the fingerprint is calculated using a hash function. Hash functions are mathematical functions which can be used to determine a fixed length message digest or fingerprint from a data item of any almost size. A hash function is a one way function--it is not possible to reverse the process to recreate the original data from the fingerprint. Hash functions are relatively slow and expensive in terms of processing power required compared to other checksum techniques 
Hence, it would have been obvious to one of ordinary skill before the effective filling date of the claimed invention to combine Harnik and YOSHII and De Landstheer because De Landstheer provides a directory system which is designed dynamically to adapt based upon the caching memory available for searching directories. Received files can be stored in a current directory until a predetermined limit is reached. In parallel, a database can be created to record which files are stored in which directory. This database can be designed to be kept in physical memory to minimize file access latency. This arrangement provides that a data storage system can store data in a simple order of receipt manner while also managing the storage structure to limit the number of data objects in any given container, thus preventing a search function analysing any given container from needing to access an excessive number of data objects and thus slow down the search to an unacceptable level (De Landstheer, See ABSTRACT) can be utilized by Harnik and YOSHII to effectively create a digest byte value histogram  and to de-duplicate data set.
Regarding claim 2, Harnik in view of YOSHII and De Landstheer further teaches the method of claim 1, wherein the data deduplication settings include the current hash algorithm and a current digest size (YOSHII, See [0096]). 
Regarding claim 3, Harnik in view of YOSHII and De Landstheer further teaches the method of claim 2, wherein the processing to update the data deduplication settings for the data set includes modifying any of the current hash algorithm and the current digest size (YOSHII, See [0108]-[0111]). 
Regarding claim 4, Harnik in view of YOSHII and De Landstheer further teaches the method of claim 3, wherein modifying the current hash algorithm includes selecting a new hash algorithm to be used in connection with generating digests for data deduplication processing performed for subsequent data chunks of the data set size (YOSHII, See [0108]-[0111]). 
Regarding claim 5, Harnik in view of YOSHII and De Landstheer further teaches the method of claim 4, wherein the new hash algorithm is a stronger hash algorithm than the current hash algorithm and the new hash algorithm is expected to generate a second distribution of frequencies with respect to byte values for bytes of generated digests whereby the second distribution of frequencies is expected to be more uniform than the frequency distribution of the frequency histogram generated using the current hash algorithm size (YOSHII, See [0108]-[0111]). 
Regarding claim 6, Harnik in view of YOSHII and De Landstheer further teaches the method of claim 5, wherein the current hash algorithm is a non-cryptographic hash algorithm that is replaced with the new hash algorithm that is a cryptographic hash algorithm size (YOSHII, See [0108]-[0111]). 
Harnik in view of YOSHII and De Landstheer further teaches the method of claim 3, wherein modifying the current digest size includes selecting a new digest size to be used in connection with generating digests for data deduplication processing performed for subsequent data chunks of the data set size (YOSHII, See [0108]-[0111]). 
Regarding claim 8, Harnik in view of YOSHII and De Landstheer further teaches the method of claim 1, wherein the current digest size is a specified number of bytes, and the frequency histogram has a plurality of dimensions including a first dimension denoting the specified number of bytes (YOSHII, See [0101]-[0105]). 
Regarding claim 9, Harnik in view of YOSHII and De Landstheer further teaches the method of claim 8, wherein the plurality of dimensions of the frequency histogram further includes a second dimension denoting a number of allowable byte values for each byte of a digest having the current digest size (YOSHII, See [0101]-[0105]). 
Regarding claim 10, Harnik in view of YOSHII and De Landstheer further teaches the method of claim 9, wherein the plurality of dimensions of the frequency histogram further includes a third dimension of counter values or frequencies for each different allowable byte value of each byte for a digest having the current digest size (YOSHII, See [0101]-[0105]). 
Regarding claim 11, Harnik in view of YOSHII and De Landstheer further teaches the method of claim 1, wherein the one or more criteria indicates that, for the frequency distribution to be sufficiently uniform, at least one statistical metric for the frequency distribution is less than a specified maximum threshold (YOSHII, See [0101]-[0105]). 
Harnik in view of YOSHII and De Landstheer further teaches the method of claim 11, wherein the at least one statistical metric includes any of variance and standard deviation (YOSHII, See [0101]-[0105]). 
Regarding claim 13, Harnik in view of YOSHII and De Landstheer further teaches the method of claim 1, wherein the data set includes any of a logical device, a database, one or more selected portions of a database, data used by a particular application stored on one or more logical devices, selected portions of one or more logical devices, one or more files, one or more directories, one or more file systems, particular portions of one or more directories, and particular portions of one or more file systems (YOSHII, See [0075]-[0078]). 
Regarding claim 16, Harnik in view of YOSHII and De Landstheer further teaches the method of claim 1, further comprising: determining, in accordance with the one or more criteria, whether the frequency distribution of the frequency histogram has maintained a specified level of uniformity for a specified time period; and responsive to determining the frequency distribution of the frequency histogram has maintained a specified level of uniformity for a specified time period, performing first processing to update the data deduplication settings for the data set, wherein said first processing includes performing any of: updating the current hash algorithm to a new hash algorithm that is computationally less intensive that the current hash algorithm whereby the new hash algorithm is expected to take less processor time than the current hash algorithm to generate a same digest; and reducing a current digest size of digests generated using the current hash algorithm (YOSHII, See [0095]-[0100]). 

Harnik teaches a system comprising: 
at least one processor (Harnik, See Figure 1 element 24 processor); and 
a memory (Harnik, See Figure 1 element 26  memory) comprising code stored thereon that, when executed, perform a method of processing data (Harnik, See [0008], the computer readable program code) comprising: 
receiving a plurality of data chunks for a data set (Harnik, See [0041], FIG. 2 is a flow diagram that schematically illustrates a method for estimating a deduplication ratio in dataset 56, in accordance with an embodiment of the present invention. In a definition step 60, a chunk size, a super-chunk size and a sampling ratio are defined, and in an partition step 62, processor 24 partitions dataset 56 into a number (also referred to herein as a first number of chunks 34); 
performing data deduplication processing for the plurality of data chunks (Harnik, See [0041], FIG. 2 is a flow diagram that schematically illustrates a method for estimating a deduplication ratio in dataset 56, in accordance with an embodiment of the present invention), wherein said data deduplication processing includes: 
	determining, using a current hash algorithm, a plurality of digests for the plurality of data chunks of the data set (Harnik, See [0049], In embodiments of the present invention, a dataset is comprises collection of items. In reality, the data is a stream of bytes, that for the purposes of deduplication is broken into data chunks 34 (this could be fixed or variable sized chunks, e.g. of size 4K) and a given hash value 44 (i.e., a digital fingerprint) is computed for each chunk 34. In some embodiments, the collection of these fingerprints is considered to be the items in the dataset, where duplication of two items means that the corresponding chunks had ; and 
	updating a frequency histogram for the data set in accordance the plurality of digests (Harnik, See [0045], In a second computation step 70, processor 24 uses observed hash value duplication histogram 38 to compute observed duplication frequency histogram 40. Duplication counts 48 and number of observations 50 in observed frequency histogram 40 comprises a histogram of how many chunks 34 are duplicated one time, how many chunks 34 are duplicated two times, how many chunks 34 are duplicated three times etc.) and does not explicitly disclose determining, in accordance with one or more criteria, whether a frequency distribution of the frequency histogram is sufficiently uniform, wherein the frequency histogram is a digest byte value histogram that tracks frequencies or counter values of different observed byte values for each individual byte position in digests computed for chunks of the data set; and responsive to determining that the frequency distribution of the frequency histogram is not sufficiently uniform, performing processing to update data deduplication settings for the data set. 
However, YOSHII teaches determining, in accordance with one or more criteria, whether a frequency distribution of the frequency histogram is sufficiently uniform (YOSHII, See [0103], The duplicated length range 271 is the range for aggregating the duplicated lengths in a frequency distribution, and is generally called "order of frequency distribution". The duplicated length can be determined from the result of execution of the deduplication process. In the drawing, the value of the duplicated length range 271 increases exponentially, but may increase linearly. Note that the "[0 KB, 4 KB)" as the value of the duplicated length range 271 means the range of 0 KB or more and less than 4 KB) ), wherein the frequency histogram is a digest byte value histogram that tracks frequencies or counter values of different observed byte values for each individual byte position in digests computed for chunks of the data set (YOSHII, See [0056]-[0057], The deduplication process is performed between the logical address space 31 and the physical address space 32. The logical address space 31 is managed by being divided in units of regions called "chunks". The size of chunks 33 may be a fixed length or a variable length. Data having a chunk size is sometimes referred to as "chunk data". Data to be divided into a plurality of chunk data (in other words, a set of a plurality of chunk data) is sometimes referred to as "data set". The data set may be one or more data blocks or a part thereof. The "data block" is data accompanying an I/O command.  For example, the deduplication process may be performed in units of chunks 33. In order to detect duplicated chunk data, calculation using a hash function is performed for each chunk data to calculate a representative value such as a hash value, and each of chunk data other than one chunk data among a plurality of chunk data having the same representative value is specified as "duplicated chunk data". Thus, the duplicated chunk data can be deleted. The representative value of chunk data is referred to as "fingerprint" in Embodiment 1. In the drawing, an alphabet illustrated in a chunk 33 represents a fingerprint 34 of the chunk 33. In this case, among the chunks 33, a chunk satisfying particular conditions is referred to as "characteristic chunk", and chunk data corresponding to the characteristic chunk is referred to as "characteristic chunk data". A fingerprint of the characteristic chunk data is referred to as "characteristic fingerprint" in order to distinguish from normal fingerprints 34. In the drawing, fingerprints of characteristic chunk data are underlined in order to distinguish between the characteristic chunks 35 and the other chunks 33); and 
responsive to determining that the frequency distribution of the frequency histogram is not sufficiently uniform, performing processing to update data deduplication settings for the data set (YOSHII, See [0104], The reduction amount 272 is a total value (cumulative reduction amount) of P individual reduction amounts corresponding to P duplicated data. The individual reduction amount is the difference between the data length before deduplication and the data length of duplicated data. The predicted reduction amount 273 is a predicted value calculated on the basis of the reduction amount 272 and the changed sampling period (predicted value of reduction amount in duplicated length range). In FIG. 9, the predicted reduction amount 273 is a predicted value obtained when the sampling period is changed from "4" to "8". The predicted maximum reduction amount 274 is a predicted value of reduction effect obtained when a sampling period ("1" in this embodiment) is selected to obtain the maximum reduction effect. The predicted maximum reduction amount 274 can be calculated by the same method as that for the predicted reduction amount 273). 
Hence, it would have been obvious to one of ordinary skill before the effective filling date of the claimed invention to combine Harnik and YOSHII because YOSHII provides  storage system is designed to: divide data into a plurality of chunk data (pieces of data) in a deduplication process; select one or more chunk data from among the plurality of chunk data in accordance with a sampling period which indicates that, on average, one chunk data be selected from among each N chunk data; and calculate a fingerprint, such as a hash value, for each of one or more characteristic chunk data, which are the selected one or more chunk data, and determine whether data including the one or more characteristic chunk data is a duplication. The storage system changes the sampling period on the basis of the results of past deduplication processes (YOSHII, See ABSTRACT) can be utilized by Harnik to effectively de-duplicate data set.
De Landstheer teaches wherein the frequency histogram is a digest byte value histogram that tracks frequencies or counter values of different observed byte values for each individual byte position in digests computed for chunks of the data set (De Landstheer, See Page 5, lines 21-48,  The fingerprint determined by the agent uniquely identifies the file or file segment. Thus no two non-identical files or segments can have the same fingerprint, and identical files or segments always have the same fingerprint. In the present example, the fingerprint is calculated using a hash function. Hash functions are mathematical functions which can be used to determine a fixed length message digest or fingerprint from a data item of any almost size. A hash function is a one way function--it is not possible to reverse the process to recreate the original data from the fingerprint. Hash functions are relatively slow and expensive in terms of processing power required compared to other checksum techniques such as CRC (Cyclic Redundancy Check) methods. However hash functions have the advantage of producing a unique fingerprint for each unique data set, in contrast to CRC methods which can produce the same result from multiple different data sets. Examples of hash functions which can be used to calculate the fingerprint in the present example include MD5, SHA1 and the so-called SHA2 "family" (including SHA224, SHA256, SHA 384 and SHA 512). Such hash functions produce a fingerprint (sometimes termed a "digest") which may typically be of between 128 and 1024 bits in length. Thus, as will become apparent, using only this very small representation of a much larger file or file segment, the file or segment can be tested for inclusion in a backup process with only minimal network traffic being required to carry this small signature between entities in the backup system).
Hence, it would have been obvious to one of ordinary skill before the effective filling date of the claimed invention to combine Harnik and YOSHII and De Landstheer because De Landstheer provides a directory system which is designed dynamically to adapt based upon the caching memory available for searching directories. Received files can be stored in a current directory until a predetermined limit is reached. In parallel, a database can be created to record which files are stored in which directory. This database can be designed to be kept in physical memory to minimize file access latency. This arrangement provides that a data storage system can store data in a simple order of receipt manner while also managing the storage structure to limit the number of data objects in any given container, thus preventing a search function analysing any given container from needing to access an excessive number of data objects and thus slow down the search to an unacceptable level (De Landstheer, See ABSTRACT) can be utilized by Harnik and YOSHII to effectively create a digest byte value histogram  and to de-duplicate data set.

Regarding claim 18, Harnik teaches a non-transitory computer readable medium comprising code stored thereon that, when executed, performs a method of processing data (Harnik, See [0008], the computer readable program code) comprising: 
receiving a plurality of data chunks for a data set (Harnik, See [0041], FIG. 2 is a flow diagram that schematically illustrates a method for estimating a deduplication ratio in dataset 56, in accordance with an embodiment of the present invention. In a definition step 60, a chunk size, a super-chunk size and a sampling ratio are defined, and in an partition step 62, processor 24 partitions dataset 56 into a number (also referred to herein as a first number of chunks 34); 
performing data deduplication processing for the plurality of data chunks, wherein said data deduplication processing chunks (Harnik, See [0041], FIG. 2 is a flow diagram that includes: 
	determining, using a current hash algorithm, a plurality of digests for the plurality of data chunks of the data set (Harnik, See [0049], In embodiments of the present invention, a dataset is comprises collection of items. In reality, the data is a stream of bytes, that for the purposes of deduplication is broken into data chunks 34 (this could be fixed or variable sized chunks, e.g. of size 4K) and a given hash value 44 (i.e., a digital fingerprint) is computed for each chunk 34. In some embodiments, the collection of these fingerprints is considered to be the items in the dataset, where duplication of two items means that the corresponding chunks had identical fingerprints. Each of the items may also hold a compression ratio (or estimated compression ratio) for the corresponding data chunk); and 
	updating a frequency histogram for the data set in accordance the plurality of digests (Harnik, See [0045], In a second computation step 70, processor 24 uses observed hash value duplication histogram 38 to compute observed duplication frequency histogram 40. Duplication counts 48 and number of observations 50 in observed frequency histogram 40 comprises a histogram of how many chunks 34 are duplicated one time, how many chunks 34 are duplicated two times, how many chunks 34 are duplicated three times etc.) and does not explicitly disclose determining, in accordance with one or more criteria, whether a frequency distribution of the frequency histogram is sufficiently uniform, wherein the frequency histogram is a digest byte value histogram that tracks frequencies or counter values of different observed byte values for each individual byte position in digests computed for chunks of the data set; and responsive to determining that the frequency distribution of the frequency histogram is not sufficiently uniform, performing processing to update data deduplication settings for the data set.
However, YOSHII teaches determining, in accordance with one or more criteria, whether a frequency distribution of the frequency histogram is sufficiently uniform (YOSHII, See [0103], The duplicated length range 271 is the range for aggregating the duplicated lengths in a frequency distribution, and is generally called "order of frequency distribution". The duplicated length can be determined from the result of execution of the deduplication process. In the drawing, the value of the duplicated length range 271 increases exponentially, but may increase linearly. Note that the "[0 KB, 4 KB)" as the value of the duplicated length range 271 means the range of 0 KB or more and less than 4 KB) ), wherein the frequency histogram is a digest byte value histogram that tracks frequencies or counter values of different observed byte values for each individual byte position in digests computed for chunks of the data set (YOSHII, See [0056]-[0057], The deduplication process is performed between the logical address space 31 and the physical address space 32. The logical address space 31 is managed by being divided in units of regions called "chunks". The size of chunks 33 may be a fixed length or a variable length. Data having a chunk size is sometimes referred to as "chunk data". Data to be divided into a plurality of chunk data (in other words, a set of a plurality of chunk data) is sometimes referred to as "data set". The data set may be one or more data blocks or a part thereof. The "data block" is data accompanying an I/O command.  For example, the deduplication process may be performed in units of chunks 33. In order to detect duplicated chunk data, calculation using a hash function is performed for each chunk data to calculate a representative value such as a hash value, and each of chunk data other than one chunk data among a plurality of chunk data having the same representative value is specified as characteristic fingerprint" in order to distinguish from normal fingerprints 34. In the drawing, fingerprints of characteristic chunk data are underlined in order to distinguish between the characteristic chunks 35 and the other chunks 33); and 
responsive to determining that the frequency distribution of the frequency histogram is not sufficiently uniform, performing processing to update data deduplication settings for the data set (YOSHII, See [0104], The reduction amount 272 is a total value (cumulative reduction amount) of P individual reduction amounts corresponding to P duplicated data. The individual reduction amount is the difference between the data length before deduplication and the data length of duplicated data. The predicted reduction amount 273 is a predicted value calculated on the basis of the reduction amount 272 and the changed sampling period (predicted value of reduction amount in duplicated length range). In FIG. 9, the predicted reduction amount 273 is a predicted value obtained when the sampling period is changed from "4" to "8". The predicted maximum reduction amount 274 is a predicted value of reduction effect obtained when a sampling period ("1" in this embodiment) is selected to obtain the maximum reduction effect. The predicted maximum reduction amount 274 can be calculated by the same method as that for the predicted reduction amount 273). 
YOSHII provides  storage system is designed to: divide data into a plurality of chunk data (pieces of data) in a deduplication process; select one or more chunk data from among the plurality of chunk data in accordance with a sampling period which indicates that, on average, one chunk data be selected from among each N chunk data; and calculate a fingerprint, such as a hash value, for each of one or more characteristic chunk data, which are the selected one or more chunk data, and determine whether data including the one or more characteristic chunk data is a duplication. The storage system changes the sampling period on the basis of the results of past deduplication processes (YOSHII, See ABSTRACT) can be utilized by Harnik to effectively de-duplicate data set.
Furthermore, De Landstheer teaches wherein the frequency histogram is a digest byte value histogram that tracks frequencies or counter values of different observed byte values for each individual byte position in digests computed for chunks of the data set (De Landstheer, See Page 5, lines 21-48,  The fingerprint determined by the agent uniquely identifies the file or file segment. Thus no two non-identical files or segments can have the same fingerprint, and identical files or segments always have the same fingerprint. In the present example, the fingerprint is calculated using a hash function. Hash functions are mathematical functions which can be used to determine a fixed length message digest or fingerprint from a data item of any almost size. A hash function is a one way function--it is not possible to reverse the process to recreate the original data from the fingerprint. Hash functions are relatively slow and expensive in terms of processing power required compared to other checksum techniques such as CRC (Cyclic Redundancy Check) methods. However hash functions have the advantage of producing a unique fingerprint for each unique data set, in contrast to CRC methods which 
Hence, it would have been obvious to one of ordinary skill before the effective filling date of the claimed invention to combine Harnik and YOSHII and De Landstheer because De Landstheer provides a directory system which is designed dynamically to adapt based upon the caching memory available for searching directories. Received files can be stored in a current directory until a predetermined limit is reached. In parallel, a database can be created to record which files are stored in which directory. This database can be designed to be kept in physical memory to minimize file access latency. This arrangement provides that a data storage system can store data in a simple order of receipt manner while also managing the storage structure to limit the number of data objects in any given container, thus preventing a search function analysing any given container from needing to access an excessive number of data objects and thus slow down the search to an unacceptable level (De Landstheer, See ABSTRACT) can be utilized by Harnik and YOSHII to effectively create a digest byte value histogram  and to de-duplicate data set.
.
Claims 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Harnik in view of YOSHII and De Landstheer as applied to claim 1 above, and further in view of Kano et al. (US Pub. No. 2006/0224844 A1), hereinafter “Kano”.
Regarding claim 14, Harnik in view of YOSHII and De Landstheer does not explicitly disclose the method of claim 1, wherein the method is performed as part of inline processing of the plurality of data chunks in connection with an I/O path or data path when servicing I/Os accessing the plurality of data chunks.
However, Kano teaches the method of claim 1, wherein the method is performed as part of inline processing of the plurality of data chunks in connection with an I/O path or data path when servicing I/Os accessing the plurality of data chunks (Kano, See [0073]). 
Hence, it would have been obvious to one of ordinary skill before the effective filling date of the claimed invention to combine Harnik and YOSHII and De Landstheer and Kano because Kano provides data migration includes copying between normal volumes and thin provisioned volumes. Data in a normal volume can be copied to a thin provisioned volume. Alternatively, data structures can be provided to facilitate converting a normal volume into a thin provisioned volume without actual copying of data. Copying from a thin provisioned volume to a normal volume is also disclosed (Kano, See ABSTRACT) can be utilized by Harnik and YOSHII to effectively access the data set.

Harnik in view of YOSHII and De Landstheer and Kano further teaches the method of claim 1, wherein the method is performed offline and not as part of inline processing of the plurality of data chunks in connection with an I/O path or data path when servicing I/Os accessing the plurality of data chunks (Kano, See [0073]). 
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Examiner’s note: Examiner has cited particular columns/paragraph and line numbers in the references applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings of the art and are applied to specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses, to fully consider the references 
In the case of amending the Claimed invention, Applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention. This will assist in expediting compact prosecution.  MPEP 714.02 recites: “Applicant should also specifically point out the support for any amendments made to the disclosure. See MPEP § 2163.06. An amendment which does not comply with the provisions of 37 CFR 1.121(b), (c), (d), and (h) may be held not fully responsive. See MPEP § 714.”  Amendments not pointing to specific support in the disclosure may be deemed as not complying with provisions of 37 C.F.R.  1.131(b), (c), (d), and (h) and therefore held not fully responsive.  Generic statements such as “Applicants believe no new matter has been introduced” may be deemed insufficient.
					Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIOW-JY FAN whose telephone number is (571)270-7846 and whose email address is shiow-jy.fan@uspto.gov.  The examiner can normally be reached on Monday-Friday 9AM to 5PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fred Ehichioya can be reached on 571-272-4034.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SHIOW-JY FAN/Primary Examiner, Art Unit 2168