DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Remarks

	In response to communications sent February 22, 2021, claim(s) 1-20 is/are pending in this application; of these claim(s) 1, 9, and 17 is/are in independent form.  Claim(s) 1-4, 6-9, 15, 17, and 18 is/are previously presented; claim(s) 5, 10-14, 16, and 19 is/are original.

Response to Arguments
Applicant's arguments filed February 22, 2021 have been fully considered but they are not persuasive.
1. Applicant argues that “…storing the first single current file and corresponding metadata on a file basis without performing a deduplication operation on the first single current file…” was not taught in the reference because the reference teaches file-level deduplication.
The Examiner believes that the rationale for the Examiner’s rejection can better be understood by clarifying what the Examiner understands the difference between file-level and block-level deduplication to be.  In file-level deduplication, redundant files may be removed, but individual files remain intact without being subdivided into blocks or chunks to decrease the file size of the individual files.  On the other hand, in block-level deduplication, a file might decrease in size due to the 
Looking at the claim language, the claims recite “…storing the first single current file and corresponding metadata on a file basis…”  The Examiner had mapped this to a file-level deduplication operation.  This is because during file-level deduplication, the file will not be further reduced in size or manipulated to remove redundant blocks.  Instead, the files should remain intact and unaltered.  In this sense, the files that are subject to a deduplication operation are stored on a “file basis” as claimed.  Regarding the second part of the clause, “… without performing a deduplication operation on the first single current file…,” the Examiner argues that during file-level deduplication, no de-duplication is performed on the first single current file because that file is not subdivided or manipulated, but is maintained intact without further file size reduction.  It would only be during block-level deduplication that the file’s integrity as a unit would be changed into a smaller representation of the file based on redundant blocks.  However, because only file-level deduplication is performed on the first single current file, that file is stored on a “file basis” without any reduction in size of the file.  Therefore the Examiner respectfully maintains the rejection.

	2. Applicant further argues that the file-level deduplication is selected based on the similarity of data objects, rather than the non-similarity.  The Examiner respectfully disagrees with the argument based on the scope of the claims and trivial relationship between similarity and dissimilarity.  Similarity measurements, .

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 

(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are:
"an obtaining module configured to obtain..." in claim 9.
"a similarity determining module configured to determine..." in claim 9.
"a first storage module configured to...store..." in claim 9.
"a second storage module configured to... apply..." in claim 9.
"a module configured to determine a hash value..." in claim 10.
"a module configure to retrieve..." in claim 10.
"a module configured to determine a SimHash value..." in claim 11.
"a module configured to determine a common hash value..." in claim 12.
"a module configured to determine... that a similar file does not exist..." in claim 13.

“a module configured to determine a hash value in claim 14.
"a module configured to map the hash value..." in claim 14.
"a module configured to determine... that a similar historical file does not exist... update…" in claim 14.
“a module configured to… retrieve… determine…” in claim 14.
"a module configured to obtain..." in claim 15.
"a module configured to retrieve..." in claim 15.
"a module configured to... replace..." in claim 15.
"a module configured to... store... in claim 15.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by U.S. 8,898,120 (“Efstathopoulos”).

As to claim 1, Efstathopoulos teaches a method for data deduplication, the method comprising:
obtaining a first single current file in the data (Efstathopoulos col 7 lines 13-33: obtaining a data object);
determining whether a similar historical file exists for the first single current file (Efstathopoulos col 7 lines 13-33: determining a similarity hash for the data object) based on a first sampled data block (Efstathopoulos col 7 lines 13-33: based on a sample of a portion of the data object) from at least one predetermined location in the first single current file (Efstathopoulos col 7 lines 13-33 the sampling based on a pre-defined property defining the first point in the data object);
in response to determining non-existence of the similar historical file for the first single current file (Efstathopoulos col 10 lines 26-47: after determining a similarity hash for a data object that quantifies some degree of dissimilarity), storing the first single current file and corresponding metadata on a file basis (Efstathopoulos col 10 lines 26-47: applying file level-deduplication) without performing a deduplication operation on the first single current file (Efstathopoulos col 10 lines 26-47: without performing a block-level deduplication operation on the data object; the examiner maps “a deduplication operation” to “a block-level deduplication operation”);
obtaining a second single current file in the data (Efstathopoulos col 7 lines 13-33: obtaining various data objects);
determining whether a similar historical file exists for the second single current file (Efstathopoulos col 7 lines 13-33: determining a similarity hash for the data object) based on a second sampled data block (Efstathopoulos col 7 lines 13-33: based on a sample of a portion of the data object) from at least one predetermined location in the second single current file (Efstathopoulos col 7 lines 13-33 the sampling based on a pre-defined property defining the first point in the data object); and
in response to determining existence of the similar historical file for the second single current file (Efstathopoulos col 10 lines 26-47: after determining a similarity hash for a data object), applying the deduplication (Efstathopoulos col 10 lines 26-47: applying block level-deduplication).

As to claim 2, Efstathopoulos teaches the method according to claim 1, wherein the determining whether the similar historical file exists for the first single current file (Efstathopoulos col 7 lines 13-33: determining a similarity hash for the data object) based on the sampled data block (Efstathopoulos col 7 lines 13-33: based on a sample of a portion of the data object) from at least one predetermined location in the first single current file (Efstathopoulos col 7 lines 13-33 the sampling based on a pre-defined property defining the first point in the data object) comprises:
determining a hash value of the first sampled data block (Efstathopoulos col 10 lines 26-47: determining a similarity hash of the sampling based on the pre-defined property defining the first point in the data object); and
retrieving a first hash value table for a hash value matching the hash value of the first sampled data block (Efstathopoulos Col 8 lines 17-44: retrieving matches for hash values using a bloom filter) and determining whether a similar historical file exists based on a statistical result of the matching (Efstathopoulos Col 8 lines 17-44: determining similarity using the bloom filters), the first hash value table recording hash values of sampled data blocks of historical files at the predetermined location (Efstathopoulos Col 8 lines 17-44: the bloom filter recording similarity hashes of all data objects stored).

(Efstathopoulos col 7 lines 1 -12: determining a “similarity hash”), and wherein the matching comprises retrieving a hash value whose similarity degree to the SimHash value of the first sampled data block exceeds a first set threshold (Efstathopoulos col 1 line – col 2 line 15: comparing similarity hashes to a threshold value).

As to claim 4, Efstathopoulos teaches the method according to claim 2, wherein the determining the hash value of the first sampled data block comprises determining a common hash value of the first sampled data block, and wherein the matching comprises retrieving a hash value identical to the common hash value of the first sampled data block (Efstathopoulos Col 8 lines 17-44: the bloom filter of similarity hashes entails finding common has values by comparison to all data objects stored).

As to claim 5, Efstathopoulos teaches the method according to claim 2, wherein the determining whether the similar historical file exists based on the statistical result of the matching (Efstathopoulos Col 8 lines 17-44: determining similarity using the bloom filters) comprises:
determining, in response to retrieving that a number of matched hash values does not exceed a second threshold, that a similar historical file does not exist (Efstathopoulos col 1 line – col 2 line 15: determining, based on the similarity comparisons to a threshold, that a potentially similar file does not exist within certain dissimilar nodes); and 
determining, in response to retrieving that the number of matched hash values exceeds the second threshold, that a similar historical file exists (Efstathopoulos col 1 line – col 2 line 15: determining, based on the similarity comparisons to a threshold, that a potentially similar file does exist within certain similar nodes).

As to claim 6, Efstathopoulos teaches the method according to claim 1, further comprising:
determining a hash value of the first sampled data block (Efstathopoulos col 7 lines 13-33: determining a similarity hash for the data object);
mapping the hash value of the sampled data block to a preset corresponding location of a Bloom filter, and determining whether corresponding locations are all set (Efstathopoulos Col 8 lines 17-44: retrieving matches for hash values using a bloom filter; bloom filters generally entail determining whether locations of hash values are set within the bloom filter);
in response to determining that the corresponding locations are not all set (Efstathopoulos Col 8 lines 17-44: based on the bloom filter analysis): 
determining that a similar historical file does not exist (Efstathopoulos col 1 line – col 2 line 15: determining, based on the similarity comparisons to a threshold, that a potentially similar file does not exist within certain dissimilar nodes), and
(Efstathopoulos Col 8 lines 17-44: populating the bloom filter); and
in response to determining that the corresponding locations are all set (Efstathopoulos Col 8 lines 17-44: based on the bloom filter analysis): 
retrieving a first hash value table for a hash value matching the hash value of the first sampled data block (Efstathopoulos col 7 lines 13-33: determining a similarity hash for the data object), and
determining whether a similar historical file exists based on a statistical result of the matching (Efstathopoulos col 1 line – col 2 line 15: determining, based on the similarity comparisons to a threshold, that a potentially similar file does not exist within certain dissimilar nodes), the first hash value table recording hash values of sampled data blocks of historical files at the predetermined location (Efstathopoulos Col 8 lines 17-44: the bloom filter populated according to similarity hashes).

As to claim 7, Efstathopoulos teaches the method according to claim 1, wherein the in response to determining existence of the similar historical file for the second single current file (Efstathopoulos col 10 lines 26-47: after determining a similarity hash for a data object), applying a deduplication operation on the second single current file on a block basis (Efstathopoulos col 10 lines 26-47: applying block level-deduplication) comprises:
(Efstathopoulos col 1 lines 52 – col 2 line 15: using a partitioned plurality of nodes, partitioned according to a hash space);
retrieving a second hash value table for the existence of the hash value of the current partitioned block, the second hash value table recording hash values of all data blocks in a historical file (Efstathopoulos col 1 lines 52 – col 2 line 15: comparing hash values to a partition of the hash space corresponding to particular nodes);
in response to existence, when the current partitioned block is stored, replacing it by a reference to the retrieved corresponding historical partitioned block (Efstathopoulos col 1 lines 52 – col 2 line 15: performing block-level deduplication using the partitioned nodes); and
in response to non-existence, storing the current partitioned block and metadata of the current partitioned block (Efstathopoulos col 1 lines 52 – col 2 line 15: deduplication generally entails store files when deduplication is not possible).

As to claim 8, Efstathopoulos teaches the method according to claim 7, wherein partitioned blocks of the file are determined by a block partitioning manner selected from the group consisting of fixed partitioned block (this element is claimed in the alternative and does not need to be mapped), variable-length partitioned block (this element is claimed in the alternative and does not need to be mapped), and content-based partitioned block (Efstathopoulos Col 7 lines 13-33: the partitioning is content-driven because the content of the data object are hashed to produce a content signature, which then determines the partitioning to nodes).

As to claim 9, Efstathopoulos teaches an apparatus for data deduplication (Efstathopoulos Figure 5), comprising:
one or more hardware processors (Efstathopoulos Figure 5);
an obtaining module configured to obtain a current file in the data (Efstathopoulos col 7 lines 13-33: obtaining a data object);
a similarity determining module configured to determine whether a similar historical file exists (Efstathopoulos col 7 lines 13-33: determining a similarity hash for the data object) based on a sampled data block (Efstathopoulos col 7 lines 13-33: based on a sample of a portion of the data object) from at least one predetermined location in the single current file (Efstathopoulos col 7 lines 13-33 the sampling based on a pre-defined property defining the first point in the data object);
a first storage module configured to, in response to determining non-existence of the similar historical file (Efstathopoulos col 10 lines 26-47: after determining a similarity hash for a data object that quantifies some degree of dissimilarity), store the single current file and corresponding metadata on a file basis (Efstathopoulos col 10 lines 26-47: applying file level-deduplication) without performing a deduplication operation on the single current file (Efstathopoulos col 10 lines 26-47: without performing a block-level deduplication operation on the data object; the examiner maps “a deduplication operation” to “a block-level deduplication operation”); and
a second storage module configured to, in response to determining existence of the similar historical file (Efstathopoulos col 10 lines 26-47: after determining a similarity hash for a data object), apply the deduplication operation on the single current file on a block basis (Efstathopoulos col 10 lines 26-47: applying block level-deduplication).

As to claim 10, Efstathopoulos teaches the apparatus according to claim 9, wherein the similarity determining module comprises:
a module configured to determine a hash value of the sampled data block (Efstathopoulos col 10 lines 26-47: determining a similarity hash of the sampling based on the pre-defined property defining the first point in the data object); and
a module configured to retrieve a first hash value table for a hash value matching the hash value of the sampled data block (Efstathopoulos Col 8 lines 17-44: retrieving matches for hash values using a bloom filter) and determine whether a similar historical file exists based on a statistical result of the matching (Efstathopoulos Col 8 lines 17-44: determining similarity using the bloom filters), the first hash value table recording hash values of sampled data blocks of historical files at the predetermined location (Efstathopoulos Col 8 lines 17-44: the bloom filter recording similarity hashes of all data objects stored).

(Efstathopoulos col 7 lines 1 -12: determining a “similarity hash”), and wherein the matching comprises retrieving a hash value whose similarity degree to the SimHash value of the sampled data block exceeds a first set threshold (Efstathopoulos col 1 line – col 2 line 15: comparing similarity hashes to a threshold value).

As to claim 12, Efstathopoulos teaches the apparatus according to claim 10, wherein the module configured to determine the hash value of the sampled data block comprises a module configured to determine a common hash value of the sampled data block and wherein the matching comprises retrieving a hash value exactly identical to the common hash value of the sampled data block (Efstathopoulos Col 8 lines 17-44: the bloom filter of similarity hashes entails finding common has values by comparison to all data objects stored).

As to claim 13, Efstathopoulos teaches the apparatus according to claim 11, wherein the module configured to retrieve the first hash value table for the hash value matching the hash value of the sampled data block and determine whether the similar historical file exists based on the statistical result of the matching comprises:
(Efstathopoulos col 1 line – col 2 line 15: determining, based on the similarity comparisons to a threshold, that a potentially similar file does not exist within certain dissimilar nodes); and
a module configured to determine, in response to retrieving that the number of matched hash values exceeds the second threshold, that a similar historical file exists (Efstathopoulos col 1 line – col 2 line 15: determining, based on the similarity comparisons to a threshold, that a potentially similar file does exist within certain similar nodes).

As to claim 14, Efstathopoulos teaches the apparatus according to claim 9, further comprising:
a module configured to determine a hash value of the sampled data block (Efstathopoulos col 7 lines 13-33: determining a similarity hash for the data object);
a module configured to map the hash value of the sampled data block to a preset corresponding location of a Bloom filter, and determine whether corresponding locations are all set (Efstathopoulos Col 8 lines 17-44: retrieving matches for hash values using a bloom filter; bloom filters generally entail determining whether locations of hash values are set within the bloom filter);
a module configured to, in response to the corresponding locations are not all set (Efstathopoulos Col 8 lines 17-44: based on the bloom filter analysis):
(Efstathopoulos col 1 line – col 2 line 15: determining, based on the similarity comparisons to a threshold, that a potentially similar file does not exist within certain dissimilar nodes), and
update the Bloom filter based on the Hash value of the sampled data block (Efstathopoulos Col 8 lines 17-44: populating the bloom filter); and
a module configured to, in response to determining that the corresponding locations are all set (Efstathopoulos Col 8 lines 17-44: based on the bloom filter analysis):
retrieve a first hash value table for a hash value matching the hash value of the sampled data block (Efstathopoulos col 7 lines 13-33: determining a similarity hash for the data object), and
determine whether a similar historical file exists based on a statistical result of the matching (Efstathopoulos col 1 line – col 2 line 15: determining, based on the similarity comparisons to a threshold, that a potentially similar file does not exist within certain dissimilar nodes), the first hash value table recording hash values of sampled data blocks of historical files at the predetermined location (Efstathopoulos Col 8 lines 17-44: the bloom filter populated according to similarity hashes).

As to claim 15, Efstathopoulos teaches the apparatus according to claim 9, wherein the second storage module comprises:
(Efstathopoulos col 1 lines 52 – col 2 line 15: using a partitioned plurality of nodes, partitioned according to a hash space);
a module configured to retrieve a second hash value table for the existence of the hash value of the current partitioned block, the second hash value table recording hash values of all data blocks in a historical file (Efstathopoulos col 1 lines 52 – col 2 line 15: comparing hash values to a partition of the hash space corresponding to particular nodes);
a module configured to, in response to existence, when the current partitioned block is stored, replace it by a reference to the retrieved corresponding historical partitioned block (Efstathopoulos col 1 lines 52 – col 2 line 15: performing block-level deduplication using the partitioned nodes); and
a module configured to, in response to non-existence, store the current partitioned block and metadata of the current partitioned block (Efstathopoulos col 1 lines 52 – col 2 line 15: deduplication generally entails store files when deduplication is not possible).

As to claim 16, Efstathopoulos teaches the apparatus according to claim 9, partitioned blocks of the file are determined by a block partitioning manner selected from the group consisting of fixed partitioned block (this element is claimed in the alternative and does not need to be mapped), variable-length partitioned block (this element is claimed in the alternative and does not need to be mapped), and content-based partitioned block (Efstathopoulos Col 7 lines 13-33: the partitioning is content-driven because the content of the data object are hashed to produce a content signature, which then determines the partitioning to nodes).

As to claim 17, Efstathopoulos teaches a computer program product for data deduplication (Efstathopoulos Figure 5), the computer program product comprising a computer readable storage medium having program code embodied therewith, the program code executable by one or more processors to perform a method (Efstathopoulos Figure 5) comprising:
obtaining a single current file in the data (Efstathopoulos col 7 lines 13-33: obtaining a data object);
determining whether a similar historical file exists (Efstathopoulos col 7 lines 13-33: determining a similarity hash for the data object) based on a sampled data block (Efstathopoulos col 7 lines 13-33: based on a sample of a portion of the data object) from at least one predetermined location in the single current file (Efstathopoulos col 7 lines 13-33 the sampling based on a pre-defined property defining the first point in the data object);
in response to determining non-existence of the similar historical file (Efstathopoulos col 10 lines 26-47: after determining a similarity hash for a data object that quantifies some degree of dissimilarity), storing the single current file (Efstathopoulos col 10 lines 26-47: applying file level-deduplication) and corresponding metadata on a file basis without performing a deduplication operation on the single current file (Efstathopoulos col 10 lines 26-47: without performing a block-level deduplication operation on the data object; the examiner maps “a deduplication operation” to “a block-level deduplication operation”); and
in response to determining existence of the similar historical file (Efstathopoulos col 10 lines 26-47: after determining a similarity hash for a data object), applying the deduplication operation on the single current file on a block basis (Efstathopoulos col 10 lines 26-47: applying block level-deduplication).

As to claim 18, Efstathopoulos teaches the computer program product according to claim 17, wherein the determining whether the similar historical file exists based on the sampled data block from at least one predetermined location in the single current file comprises:
determining a hash value of the sampled data block (Efstathopoulos col 10 lines 26-47: determining a similarity hash of the sampling based on the pre-defined property defining the first point in the data object); and
retrieving a first hash value table for a hash value matching the hash value of the sampled data block (Efstathopoulos Col 8 lines 17-44: retrieving matches for hash values using a bloom filter) and determining whether a similar historical file exists based on a statistical result of the matching (Efstathopoulos Col 8 lines 17-44: determining similarity using the bloom filters), the first hash value table recording a hash value of the sampled data block of the historical file at the predetermined location (Efstathopoulos Col 8 lines 17-44: the bloom filter recording similarity hashes of all data objects stored).

As to claim 19, Efstathopoulos teaches the computer program product according to claim 18, wherein the determining the hash value of the sampled data block comprises determining a SimHash value of the sampled data block (Efstathopoulos col 7 lines 1 -12: determining a “similarity hash”), and wherein the matching comprises retrieving a hash value whose similarity degree to the SimHash value of the sampled data block exceeds a first set threshold (Efstathopoulos col 1 line – col 2 line 15: comparing similarity hashes to a threshold value).

As to claim 20, Efstathopoulos teaches the computer program product according to claim 18, wherein the determining whether the similar historical file exists based on the statistical result of the matching comprises:
determining, in response to retrieving that a number of matched hash values does not exceed a second threshold, that a similar historical file does not exist (Efstathopoulos col 1 line – col 2 line 15: determining, based on the similarity comparisons to a threshold, that a potentially similar file does not exist within certain dissimilar nodes); and 
determining, in response to retrieving that the number of matched hash values exceeds the second threshold, that a similar historical file exists (Efstathopoulos col 1 line – col 2 line 15: determining, based on the similarity comparisons to a threshold, that a potentially similar file does exist within certain similar nodes).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 10,877,945:  A recently published patent that involves conditional deduplication of a file.
The following art has previously been made of record:
Yao et al., "Simdedup: A New Deduplication Scheme Based on Simhash", WAIM 2013 Workshops, LNCS 7901, pp: 79-88, 2013. Springer-Verlag Berlin Heidelberg 2013.
However, Yao et al. does not teach using a bloom filter for the deduplication step at the block level, focusing instead on finding similar file blocks using the SimHash algorithm.
US 2014/0344229 A1 teaches a similarity threshold for hashed data chunks for deduplication, focusing on selection of a most similar back-end node as a deduplication target.

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jesse P Frumkin whose telephone number is (571)270-1849.  The examiner can normally be reached on Monday - Saturday, 10-5 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Karlheinz R Skowronek can be reached on (571) 272-9047.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative 






/JESSE P FRUMKIN/Primary Examiner, Art Unit 1631                                                                                                                                                                                                        March 10, 2021