DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is responsive to the Applicant’s Application filed on 6/10/2019.
Claims 1-21 are pending. Claims 1, 8, and 15 are independent.

Drawings
The drawings, filed 6/10/2019, are considered in compliance with 37 CFR 1.81 and accepted.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. 

Claims 1-21 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or joint inventor, or for pre-AIA  applicant, regards as the invention.

Regarding claim 1, the claim recites on line 4 selecting seed blocks that are “similar to each other.” The claim then further recites on line 6 newly received blocks are stored near “similar seed blocks.” The term "similar" in these claim limitations is a relative term, which renders the claim indefinite.  The term "similar" is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably 
Further, regarding claim 1, the claim recites on line 6 “newly received blocks” however the scope of such “newly received” data is unclear and indefinite. Particularly, it is unclear in what aspect the data is “new” in that there is not indication within the claims of any prior “receiving” or “received” data or blocks, and no indication of any existing “stored” data or blocks. Thus it is unclear if “newly received” is meant to refer to data new to the entire computing system or deduplication engine, or rather if “newly received” encompasses simply new data to the sedimentation phase, such as selecting another stored data set for processing. “New” is essentially a relative term, as data can be new to a system, new to a stage, or new to consideration, without being a first appearance entirely of the daa. Thus, the scope of the claim is unclear and indefinite. Examiner would suggest clarifying how, and when, the data blocks are received and stored into the computing system and/or deduplication engine to clarify what data blocks are being referred to in the sedimentation phase. This would also likely clarify whether the distinction between the claimed phases as it relates to offline/post-processing phases and inline/streaming data phases; to any extent, that such distinction is important to the claimed invention.
Further regarding claim 1, the claim recites on line 6 that the received data blocks are “processed to be stored” near similar seed blocks. It is unclear and indefinite what “processed to be stored” means in the scope of a deduplication engine. Specifically, “processed” indicates there is more 
Further regarding claim 1, the claim recites on line 6, that the newly received blocks are stored “near” similar seed blocks. The term "near" in the limitation is a relative term which renders the claim indefinite.  The term "near" is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.  Further, it is unclear what would constitute near in storing the block, in that this could encompass storing the block in the same server down to a granularity of storing the block in the same contiguous memory region abutting the stored seed block of the physical memory. Thus, it is unclear and indefinite what “stored near” would encompasses within the scope of the claim. Examiner would suggest clarifying how the block is stored near the seed blocks, which likely would indicate both how the relative near is being used and clarify the storage of seed blocks. As the claim stands, there is no introduction or mention that the “seed blocks” are stored at all. The claim simply makes reference to selecting a set of similar seed blocks, but such are not even recited as stored.

Dependent claims 2-7 inherit all the indefiniteness rejections of their parent claim 1 and therefore are indefinite for all the same reasons.
Further, regarding dependent claim 2, the claim recites on lines 2-3 “every block that is not de-duplicated.” However it is unclear and indefinite what “block[s]” are being referred to in this limitation as there are not generic “block[s]” introduced in the claim or its parent. Claim 1 introduces “a set of seed blocks” and “newly received blocks.” Claim 2 in the preamble refers to “the set of seed blocks” but then describes “every block that is not de-duplicated” and it is unclear if this is only of the “set of seed blocks” already selected, or to some other preexisting set of blocks, ostensibly stored somewhere. Similarly, line 9 refers to “each block”, but again there is no introduction of any blocks beyond the post-selection set of seed blocks. Thus it is unclear and indefinite what “block[s]” are being referred to in the claim as there is no introduction of any deduplicated or not deduplicated blocks at all. Therefore, the scope of the claim is indefinite.
Dependent claims 3-4 inherit the same indefiniteness as in parent claim 2.
Further regarding dependent claim 3, the claim refers to a “first number” being configured based on certain criteria. However claim 2 from which claim 3 depends recites “a first number of the hash components of the block have an element count.” It is unclear how the first number can be configured as it is already defined as a count. Thus, the scope is indefinite. Examiner would suggest applicant is referring to the threshold, not the first number for clarification.
Further, regarding claim 6, the claim recites “a current operating phase.” However, it is unclear and indefinite what such refers to. The parent claims introduce a “coalescing” phase and a “sedimentation” phase that are cycled, but there is no introduction of what constitutes a “current operating phase.” Specifically, given this is a cycling, current operating phase, and sequencing of such, could refer to a cycle, that is both 1st coalescing phase and 1st sedimentation phase, being considered a first operating phase, such that only in cycling to the 2nd coalescing phase would a second operating phase occur. Alternatively, the 1st coalescing phase along could be considered the current operating phase, and moving the to the 1st sedimentation phase could be considered the second operating phase 


Independent claims 8 and 15 recite substantially the same limitations as in claim 1. Therefore, without repeating the explanations given in detail for claim 1, claims 8 and 15 are indefinite for the same reasons as given above with respect to claim 1.
Dependent claims 9-14 and 16-21 inherit all the indefiniteness rejections of their parent claim 8 or 15 and therefore are indefinite for all the same reasons.

Further, dependent claims 9 and 16, recite substantially the same limitations as in claim 2. Therefore, without repeating the explanations given in detail for claim 2, claims 9 and 16 are indefinite for the same reasons as given above with respect to claim 2.
Dependent claims 10-11 and 17-18 inherit the same indefiniteness as in parent claim 9 or 16.

Further, dependent claims claim 13 and 20, recite substantially the same limitations as in claim 6. Therefore, without repeating the explanations given in detail for claim 6, claims 13 and 20 are indefinite for the same reasons as given above with respect to claim 6.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 5, 7-8, 12, 14-15, 19, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Ghosh et al., U.S. Patent No. 9,946,724 (hereinafter Ghosh) in view of Singhai et al., U.S. Patent Pub. No. 2018/0025046 (hereinafter Singhai).

Regarding claim 1, Ghosh in the analogous art of scalable deduplication with phase rotation teaches:
A method for building content for a de-duplication engine, the method comprising: (See Ghosh Abstract, and at least Fig. 7, Fig. 8, and claim 10 wherein the invention is embodied as a method for deduplication including generating data sets or content for a deduplication engine).
periodically receiving instructions for cycling through a coalescing phase and a sedimentation phase of the de-duplication engine; (See Ghosh col. 4:19-32 and 54-56 describing phase rotation (i.e. cycling) between various phases of  deduplication including an sharing/filesystem and index update (i.e. coalescing for seeding phase) and enumeration for commonality (i.e. sedimentation phase). See further col. 5:28-46 wherein phases are cycled based on completion of previous cycle or other schedule trigger. See also col. col. 8:4-16 and col. 9:34-40 wherein phase rotation component determines to cycle through phases of deduplication engine).
 during a first coalescing phase, selecting a set of seed blocks that are similar to each other; (See Ghosh col. 5:28-29 wherein a first dataset (i.e. seed data set) is processed through deduplication phases including as described in as in col. 6:8-65 and col. 7:4-35 wherein the dataset is processed to identify similar blocks and create and populate and index table of seed/initial dataset blocks. See further col. 8:17-67 and col. 9:8-40 wherein the index updating is a seeding phase, that stores/seeds similar/shared blocks for future consideration in processing in the next phase rotation. See also col. 3:39-56).
when an instruction for proceeding to a next sedimentation phase is received, entering the sedimentation phase during which newly received blocks are processed [with regard to] similar seed blocks; and (See Ghosh col. 9:34-40 wherein the ending of the indexing/coalescing phase for the first dataset provides an instruction to cycle/rotate phases and ingests the second data set, which is a newly received data set of data blocks. Then as in col. 8:6-67 and col 9:1-31 the new/next data set of blocks is ingested and processed to identify commonality as being similar to the indexed/seed blocks based on being potential matching candidates by high and low level hashing, which identifies a set of candidate blocks that are newly received (i.e. second dataset) and similar to seed/index blocks. The candidates are stored in the candidate table before the next updating/coalescing phase. See also col. 3:39-56).
when an instruction to proceed to a next coalescing phase is received, entering the coalescing phase to update the set of seed blocks. (See col. 9:32-col.10:17 wherein after enumeration/commonality phase (i.e. sedimentation) the candidate blocks (i.e. newly received similar blocks) are processed in a sharing/index update phase that coalesces and stores the candidate blocks as processed into an index update of seed blocks, and updates the indexed seed blocks. Also note phase rotation as described above. See also col. 3:39-56).
Ghosh does not explicitly teach 
[a sedimentation phase during which newly received blocks are processed] to be stored near similar seed blocks; (But note Ghosh as cited above relating to a sedimentation phase in which similar/matching blocks identified are stored so as to update the file system, but not explicitly based on storing ‘near’ existing seed blocks. See also Ghosh col. 10:43-65 wherein different blocks are stored in different disk pool locations, but not explicitly storing the candidates in such).
However, Singhai in the analogous art of reference/seed set construction and deduplication processing teaches:
[a sedimentation phase during which newly received blocks are processed] to be stored near similar seed blocks; (See Singhai [0038], [0060] wherein an incoming dataset is identified for similar aspects/scope to seed sets and determined to fit into a specific seed set then as in [0063], [0066], [0074], [0076] determines a similar seed/reference set and packages/stores the new blocks with the seed/reference set blocks. See also [0081], [0094]. See further [0097]-[0098).
It would have been obvious to one of ordinary skill in the art to combine the teachings of Singhai with the teachings of Ghosh. One having ordinary skill in the art would have been motivated to combine the storing of newly processed data blocks near similar seed blocks for deduplication as in Singhai with the phase rotation of a phase for seeding/coalescing data blocks to update and index of stored blocks and a phase for determining similarity of newly ingested blocks to seed/index blocks in order to exploit 

Regarding claim 5, Ghosh in view of Singhai as applied above to claim 1 further teaches:
The method of claim 1, wherein instructions for cycling through a coalescing phase and a sedimentation phase of the de-duplication engine are received from a monitor of the de-duplication engine. (See Ghosh col. 4:19-32 and 54-56 describing phase rotation (i.e. cycling) between various phases of  deduplication based on monitoring completion of previous data set or timing, etc. See further col. 5:28-46 wherein phases are cycled based on completion of previous cycle or other schedule trigger. See also col. col. 8:4-16 and col. 9:34-40 wherein phase rotation component determines to cycle through phases of deduplication engine. Note also Singhai [0068] periodically or based on trigger updating seed sets).

Regarding claim 7, Ghosh in view of Singhai as applied above to claim 1 further teaches:
The method of claim 1, wherein the cycling further includes a cleaning phase in which data and objects no longer needed are removed from the de-duplication engine. (See Singhai [0057] wherein reference/seed blocks and sets are removed/retired through garbage collection cleaning phase as part of cycle for reference set creation, reference set application, and reference set cleaning. See also 

Regarding claim 8, Ghosh in the analogous art of scalable deduplication with phase rotation teaches:
A system for building content for a de-duplication engine, comprising: (See Ghosh Abstract, and at least Fig. 10 wherein the invention is embodied as a system for deduplication including generating data sets or content for a deduplication engine).
at least one processor of a server, the processor configured to: (See Ghosh Fig. 10 with processor 1002. See also Fig. 9, blade server with CPU cores).
periodically receive instructions for cycling through a coalescing phase and a sedimentation phase of the de-duplication engine; (See Ghosh col. 4:19-32 and 54-56 describing phase rotation (i.e. cycling) between various phases of  deduplication including an sharing/filesystem and index update (i.e. coalescing for seeding phase) and enumeration for commonality (i.e. sedimentation phase). See further col. 5:28-46 wherein phases are cycled based on completion of previous cycle or other schedule trigger. See also col. col. 8:4-16 and col. 9:34-40 wherein phase rotation component determines to cycle through phases of deduplication engine).
during a first coalescing phase, select a set of seed blocks that are similar to each other; (See Ghosh col. 5:28-29 wherein a first dataset (i.e. seed data set) is processed through deduplication phases including as described in as in col. 6:8-65 and col. 7:4-35 wherein the dataset is processed to identify similar blocks and create and populate and index table of seed/initial dataset blocks. See further col. 
when an instruction for proceeding to a next sedimentation phase is received, enter the sedimentation phase during which newly received blocks are processed [with regard to] similar seed blocks; and  (See Ghosh col. 9:34-40 wherein the ending of the indexing/coalescing phase for the first dataset provides an instruction to cycle/rotate phases and ingests the second data set, which is a newly received data set of data blocks. Then as in col. 8:6-67 and col 9:1-31 the new/next data set of blocks is ingested and processed to identify commonality as being similar to the indexed/seed blocks based on being potential matching candidates by high and low level hashing, which identifies a set of candidate blocks that are newly received (i.e. second dataset) and similar to seed/index blocks. The candidates are stored in the candidate table before the next updating/coalescing phase. See also col. 3:39-56).
when an instruction to proceed to a next coalescing phase is received, enter the coalescing phase to update the set of seed blocks. (See col. 9:32-col.10:17 wherein after enumeration/commonality phase (i.e. sedimentation) the candidate blocks (i.e. newly received similar blocks) are processed in a sharing/index update phase that coalesces and stores the candidate blocks as processed into an index update of seed blocks, and updates the indexed seed blocks. Also note phase rotation as described above. See also col. 3:39-56).
Ghosh does not explicitly teach 
[a sedimentation phase during which newly received blocks are processed] to be stored near similar seed blocks; (But note Ghosh as cited above relating to a sedimentation phase in which similar/matching blocks identified are stored so as to update the file system, but not explicitly based on storing ‘near’ existing seed blocks. See also Ghosh col. 10:43-65 wherein different blocks are stored in different disk pool locations, but not explicitly storing the candidates in such).

[a sedimentation phase during which newly received blocks are processed] to be stored near similar seed blocks; (See Singhai [0038], [0060] wherein an incoming dataset is identified for similar aspects/scope to seed sets and determined to fit into a specific seed set then as in [0063], [0066], [0074], [0076] determines a similar seed/reference set and packages/stores the new blocks with the seed/reference set blocks. See also [0081], [0094]. See further [0097]-[0098).
It would have been obvious to one of ordinary skill in the art to combine the teachings of Singhai with the teachings of Ghosh. One having ordinary skill in the art would have been motivated to combine the storing of newly processed data blocks near similar seed blocks for deduplication as in Singhai with the phase rotation of a phase for seeding/coalescing data blocks to update and index of stored blocks and a phase for determining similarity of newly ingested blocks to seed/index blocks in order to exploit locality in reference/seed data set which provides an advantage in temporal locality of incoming data where collocated data blocks arrive in the same time interval thus reducing the number of seeds/references needed to compare for deduplication and to provide a better deduplication ratio as the blocks heading to storage have the same characteristics prior to deduplication/coalescing. See Singhai [0033], [0077], [0084], and [0114]. Note that Ghosh as above already also identifies such similarity of new/next dataset blocks as commonality for candidate blocks to be incorporated into an index/update of seeds in a next phase, and Singhai simply adds that similar candidates and seeds are stored in a local/near area before processing to create local reference/seed sets/indexes.


Regarding claim 12, Ghosh in view of Singhai as applied above to claim 8 further teaches:
The system of claim 8, wherein instructions for cycling through a coalescing phase and a sedimentation phase of the de-duplication engine are received from a monitor of the de-duplication engine. (See Ghosh col. 4:19-32 and 54-56 describing phase rotation (i.e. cycling) between various phases of  deduplication based on monitoring completion of previous data set or timing, etc. See further col. 5:28-46 wherein phases are cycled based on completion of previous cycle or other schedule trigger. See also col. col. 8:4-16 and col. 9:34-40 wherein phase rotation component determines to cycle through phases of deduplication engine. Note also Singhai [0068] periodically or based on trigger updating seed sets).

Regarding claim 14, Ghosh in view of Singhai as applied above to claim 8 further teaches:
The system of claim 8, wherein the cycling further includes a cleaning phase in which data and objects no longer needed are removed from the de-duplication engine. (See Singhai [0057] wherein reference/seed blocks and sets are removed/retired through garbage collection cleaning phase as part of cycle for reference set creation, reference set application, and reference set cleaning. See also [0068], [0096], and [0107]-[0108]. See also Ghosh col. 9:28-31 wherein data is deleted from the file system and entries/objects in the index table are no longer shared and removed. Note that the claim recites “objects no longer needed” which leaves open to interpretation whether such object is a block, a page, a hash, an index entry, or some other object. The claimed object has no require relation to the blocks recited in the claims).

Regarding claim 15, Ghosh in the analogous art of scalable deduplication with phase rotation aches:
A non-transitory computer readable medium storing thereon computer executable instructions for building content for a de-duplication engine, including instructions for: (See Ghosh col. 
periodically receiving instructions for cycling through a coalescing phase and a sedimentation phase of the de-duplication engine; (See Ghosh col. 4:19-32 and 54-56 describing phase rotation (i.e. cycling) between various phases of  deduplication including an sharing/filesystem and index update (i.e. coalescing for seeding phase) and enumeration for commonality (i.e. sedimentation phase). See further col. 5:28-46 wherein phases are cycled based on completion of previous cycle or other schedule trigger. See also col. col. 8:4-16 and col. 9:34-40 wherein phase rotation component determines to cycle through phases of deduplication engine).
during a first coalescing phase, selecting a set of seed blocks that are similar to each other; (See Ghosh col. 5:28-29 wherein a first dataset (i.e. seed data set) is processed through deduplication phases including as described in as in col. 6:8-65 and col. 7:4-35 wherein the dataset is processed to identify similar blocks and create and populate and index table of seed/initial dataset blocks. See further col. 8:17-67 and col. 9:8-40 wherein the index updating is a seeding phase, that stores/seeds similar/shared blocks for future consideration in processing in the next phase rotation. See also col. 3:39-56).
when an instruction for proceeding to a next sedimentation phase is received, entering the sedimentation phase during which newly received blocks are processed [with regard to] similar seed blocks; and  (See Ghosh col. 9:34-40 wherein the ending of the indexing/coalescing phase for the first dataset provides an instruction to cycle/rotate phases and ingests the second data set, which is a newly received data set of data blocks. Then as in col. 8:6-67 and col 9:1-31 the new/next data set of blocks is ingested and processed to identify commonality as being similar to the indexed/seed blocks based on being potential matching candidates by high and low level hashing, which identifies a set of candidate 
when an instruction to proceed to a next coalescing phase is received, entering the coalescing phase to update the set of seed blocks. (See col. 9:32-col.10:17 wherein after enumeration/commonality phase (i.e. sedimentation) the candidate blocks (i.e. newly received similar blocks) are processed in a sharing/index update phase that coalesces and stores the candidate blocks as processed into an index update of seed blocks, and updates the indexed seed blocks. Also note phase rotation as described above. See also col. 3:39-56).
Ghosh does not explicitly teach 
[a sedimentation phase during which newly received blocks are processed] to be stored near similar seed blocks; (But note Ghosh as cited above relating to a sedimentation phase in which similar/matching blocks identified are stored so as to update the file system, but not explicitly based on storing ‘near’ existing seed blocks. See also Ghosh col. 10:43-65 wherein different blocks are stored in different disk pool locations, but not explicitly storing the candidates in such).
However, Singhai in the analogous art of reference/seed set construction and deduplication processing teaches:
[a sedimentation phase during which newly received blocks are processed] to be stored near similar seed blocks; (See Singhai [0038], [0060] wherein an incoming dataset is identified for similar aspects/scope to seed sets and determined to fit into a specific seed set then as in [0063], [0066], [0074], [0076] determines a similar seed/reference set and packages/stores the new blocks with the seed/reference set blocks. See also [0081], [0094]. See further [0097]-[0098).
It would have been obvious to one of ordinary skill in the art to combine the teachings of Singhai with the teachings of Ghosh. One having ordinary skill in the art would have been motivated to combine the storing of newly processed data blocks near similar seed blocks for deduplication as in Singhai with 


Regarding claim 19, Ghosh in view of Singhai as applied above to claim 15further teaches:
The non-transitory computer readable medium of claim 15, wherein instructions for cycling through a coalescing phase and a sedimentation phase of the de-duplication engine are received from a monitor of the de-duplication engine. (See Ghosh col. 4:19-32 and 54-56 describing phase rotation (i.e. cycling) between various phases of  deduplication based on monitoring completion of previous data set or timing, etc. See further col. 5:28-46 wherein phases are cycled based on completion of previous cycle or other schedule trigger. See also col. col. 8:4-16 and col. 9:34-40 wherein phase rotation component determines to cycle through phases of deduplication engine. Note also Singhai [0068] periodically or based on trigger updating seed sets).

Regarding claim 21, Ghosh in view of Singhai as applied above to claim 15 further teaches:
The non-transitory computer readable medium of claim 15, wherein the cycling further includes a cleaning phase in which data and objects no longer needed are removed from the de- duplication engine. (See Singhai [0057] wherein reference/seed blocks and sets are removed/retired through garbage collection cleaning phase as part of cycle for reference set creation, reference set application, and reference set cleaning. See also [0068], [0096], and [0107]-[0108]. See also Ghosh col. 9:28-31 wherein data is deleted from the file system and entries/objects in the index table are no longer shared and removed. Note that the claim recites “objects no longer needed” which leaves open to interpretation whether such object is a block, a page, a hash, an index entry, or some other object. The claimed object has no require relation to the blocks recited in the claims).

Claims 2-4, 9-11, and 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over Ghosh in view of Singhai and further in view of Dor et al., U.S. Patent Pub. No. 2016/0054930 (hereinafter Dor).

Regarding claim 2, Ghosh in view of Singhai as applied above to claim 1 further teaches:
The method of claim 1, wherein the selection of the set of seed blocks comprises:
Ghosh in view of Singhai does not explicitly teach a counting hash set as:
creating a counting hash set for storing every hash component of every block that is not de-duplicated; (But note Singhai generally as in [0064] updating/incrementing a reference count for use of a reference/seed block and as in [0095] as part of the hashing, but not explicitly component based).
defining a counting set for hash components, wherein an element of the counting set is used for keeping track of a number of identical elements of the counting hash set contained in a counting set instance;
iteratively, for all blocks that are not de-duplicated, looking up element counts for each hash component of the block; and 
for each block, determining that the block is a seed block when a first number of the hash components of the block have an element count that reaches a predetermined threshold.
However, Dor in the analogous art of deduplication processing with block component hashing and counting teaches:
creating a counting hash set for storing every hash component of every block that is not de-duplicated; (See Dor [0106]-[0126] wherein as in [0106] every data unit/block is assigned a unique index and processed to generate chunks/components of the block. Since this is done for every block it is done for every block not deduplicated. Then as in [0108] the hash generator hashes each component/chunk and stores the hashed indexes in a table of hashes with counters. See also generally Fig. 4 and Fig. 6).
defining a counting set for hash components, wherein an element of the counting set is used for keeping track of a number of identical elements of the counting hash set contained in a counting set instance; (See Dor [0106]-[0126] and particularly as in [0108]-[0119] a count of each element/chunk hashed is maintained to track identified data units with the same/identical hashed chunks/elements).
iteratively, for all blocks that are not de-duplicated, looking up element counts for each hash component of the block; and (See Dor [0106]-[0126] and particularly [0113]-[0119] wherein the counts for each chunk/element are determined and used to identify such as similar among a number of data blocks/units).
for each block, determining that the block is a seed block when a first number of the hash components of the block have an element count that reaches a predetermined threshold. (See Dor [0106]-[0126] and particularly [0113]-[0119] and [0121]-[0126] wherein based on a threshold, a determination is made of similar blocks to serve as a seed block for deduplication and uses the page unit size to determined pairs of seed blocks with both threshold count similar chunks/pages and highest similarity to determine such as a reference/seed block).


Regarding claim 3, Ghosh in view of Singhai and Dor as applied above toc claim 2 further teaches:
The method of claim 2, wherein the first number is configured based on at least one of: a length of time since the set of seed blocks has been updated, and a performance measurement of the set of seed blocks. (See Singhai [0009], [0084], [0090], [0113]-[0115], [0123]-[0124], and [0132]-[0133] wherein performance evaluation module evaluates the performance of the seed/reference set based on criteria (i.e. performance measurement) including deduplication ratio and determined to proceed/expand reference/seed set based on deduplication ratio comparing input blocks sample to seed blocks to set a predetermined threshold as in [0009]. See also Dor [0007], [0025], [0110], [0113], and [0114] generally wherein the threshold for the first number is based on a performance measurement of the seed blocks as a hash-based performance measurement threshold).

Regarding claim 4, Ghosh in view of Singhai and Dor as applied above to claim 3 further teaches:
The method of claim 3, wherein the performance measurement is made for determination of a need to expand the set of seed blocks, the measurement being based on a similarity of a sample of newly received blocks to a current set of seed blocks. (See Singhai [0009], [0084], [0090], [0113]-[0115], [0123]-[0124], and [0132]-[0133] wherein performance evaluation module evaluates the performance of the seed/reference set based on criteria (i.e. performance measurement) including deduplication ratio and determined to proceed/expand reference/seed set based on deduplication ratio comparing input blocks sample to seed blocks to set a predetermined threshold as in [0009]. See also Dor [0007], [0025], [0110], [0113], and [0114] generally wherein the threshold for the first number is based on a performance measurement of the seed blocks as a hash-based performance measurement threshold).

Regarding claim 9, Ghosh in view of Singhai as applied above to claim 8 further teaches:
The system of claim 8, wherein the configuration to select the set of seed blocks includes configurations to: 
Ghosh in view of Singhai does not explicitly teach a counting hash set as:
create a counting hash set for storing every hash component of every block that is not de- duplicated; (But note Singhai generally as in [0064] updating/incrementing a reference count for use of a reference/seed block and as in [0095] as part of the hashing, but not explicitly component based).
define a counting set for hash components, wherein an element of the counting set is used for keeping track of a number of identical elements of the counting hash set contained in a counting set instance; 
iteratively, for all blocks that are not de-duplicated, look up element counts for each hash component of the block; and 
for each block, determine that the block is a seed block when a first number of the hash components of the block have an element count that reaches a predetermined threshold.
However, Dor in the analogous art of deduplication processing with block component hashing and counting teaches:
create a counting hash set for storing every hash component of every block that is not de- duplicated; (See Dor [0106]-[0126] wherein as in [0106] every data unit/block is assigned a unique index and processed to generate chunks/components of the block. Since this is done for every block it is done for every block not deduplicated. Then as in [0108] the hash generator hashes each component/chunk and stores the hashed indexes in a table of hashes with counters. See also generally Fig. 4 and Fig. 6).
define a counting set for hash components, wherein an element of the counting set is used for keeping track of a number of identical elements of the counting hash set contained in a counting set instance; (See Dor [0106]-[0126] and particularly as in [0108]-[0119] a count of each element/chunk hashed is maintained to track identified data units with the same/identical hashed chunks/elements).
iteratively, for all blocks that are not de-duplicated, look up element counts for each hash component of the block; and (See Dor [0106]-[0126] and particularly [0113]-[0119] wherein the counts for each chunk/element are determined and used to identify such as similar among a number of data blocks/units).
for each block, determine that the block is a seed block when a first number of the hash components of the block have an element count that reaches a predetermined threshold. (See Dor [0106]-[0126] and particularly [0113]-[0119] and [0121]-[0126] wherein based on a threshold, a determination is made of similar blocks to serve as a seed block for deduplication and uses the page unit size to determined pairs of seed blocks with both threshold count similar chunks/pages and highest similarity to determine such as a reference/seed block).


Regarding claim 10, Ghosh in view of Singhai and Dor as applied above toc claim 9 further teaches:
The system of claim 9, wherein the first number is configured based on at least one of: a length of time since the set of seed blocks has been updated, and a performance measurement of the set of seed blocks. (See Singhai [0009], [0084], [0090], [0113]-[0115], [0123]-[0124], and [0132]-[0133] wherein performance evaluation module evaluates the performance of the seed/reference set based on criteria (i.e. performance measurement) including deduplication ratio and determined to proceed/expand reference/seed set based on deduplication ratio comparing input blocks sample to seed blocks to set a predetermined threshold as in [0009]. See also Dor [0007], [0025], [0110], [0113], and [0114] generally wherein the threshold for the first number is based on a performance measurement of the seed blocks as a hash-based performance measurement threshold).

Regarding claim 11, Ghosh in view of Singhai and Dor as applied above toc claim 10 further teaches:
The system of claim 10, wherein the performance measurement is made for determination of a need to expand the set of seed blocks, the measurement being based on a similarity of a sample of newly received blocks to a current set of seed blocks. (See Singhai [0009], [0084], [0090], [0113]-[0115], [0123]-[0124], and [0132]-[0133] wherein performance evaluation module evaluates the performance of the seed/reference set based on criteria (i.e. performance measurement) including deduplication ratio and determined to proceed/expand reference/seed set based on deduplication ratio comparing input blocks sample to seed blocks to set a predetermined threshold as in [0009]. See also Dor [0007], [0025], [0110], [0113], and [0114] generally wherein the threshold for the first number is based on a performance measurement of the seed blocks as a hash-based performance measurement threshold).

Regarding claim 16, Ghosh in view of Singhai as applied above to claim 15 further teaches:
The non-transitory computer readable medium of claim 15, wherein the instructions for selecting the set of seed blocks include instructions for: 
creating a counting hash set for storing every hash component of every block that is not de-duplicated; (But note Singhai generally as in [0064] updating/incrementing a reference count for use of a reference/seed block and as in [0095] as part of the hashing, but not explicitly component based).
defining a counting set for hash components, wherein an element of the counting set is used for keeping track of a number of identical elements of the counting hash set contained in a counting set instance;
iteratively, for all blocks that are not de-duplicated, looking up element counts for each hash component of the block; and 
for each block, determining that the block is a seed block when a first number of the hash components of the block have an element count that reaches a predetermined threshold.

creating a counting hash set for storing every hash component of every block that is not de-duplicated; (See Dor [0106]-[0126] wherein as in [0106] every data unit/block is assigned a unique index and processed to generate chunks/components of the block. Since this is done for every block it is done for every block not deduplicated. Then as in [0108] the hash generator hashes each component/chunk and stores the hashed indexes in a table of hashes with counters. See also generally Fig. 4 and Fig. 6).
defining a counting set for hash components, wherein an element of the counting set is used for keeping track of a number of identical elements of the counting hash set contained in a counting set instance; (See Dor [0106]-[0126] and particularly as in [0108]-[0119] a count of each element/chunk hashed is maintained to track identified data units with the same/identical hashed chunks/elements).
iteratively, for all blocks that are not de-duplicated, looking up element counts for each hash component of the block; and (See Dor [0106]-[0126] and particularly [0113]-[0119] wherein the counts for each chunk/element are determined and used to identify such as similar among a number of data blocks/units).
for each block, determining that the block is a seed block when a first number of the hash components of the block have an element count that reaches a predetermined threshold. (See Dor [0106]-[0126] and particularly [0113]-[0119] and [0121]-[0126] wherein based on a threshold, a determination is made of similar blocks to serve as a seed block for deduplication and uses the page unit size to determined pairs of seed blocks with both threshold count similar chunks/pages and highest similarity to determine such as a reference/seed block).
It would have been obvious to one of ordinary skill in the art to combine the teachings of Dor with the teachings of Singhai and Ghosh. One having ordinary skill in the art would have been motivated to combine the counting hash set of components of blocks to identify reference/seed blocks as in Dor 

Regarding claim 17, Ghosh in view of Singhai and Dor as applied above toc claim 16 further teaches:
The non-transitory computer readable medium of claim 16, wherein the first number is configured based on at least one of: a length of time since the set of seed blocks has been updated, and a performance measurement of the set of seed blocks. (See Singhai [0009], [0084], [0090], [0113]-[0115], [0123]-[0124], and [0132]-[0133] wherein performance evaluation module evaluates the performance of the seed/reference set based on criteria (i.e. performance measurement) including deduplication ratio and determined to proceed/expand reference/seed set based on deduplication ratio comparing input blocks sample to seed blocks to set a predetermined threshold as in [0009]. See also Dor [0007], [0025], [0110], [0113], and [0114] generally wherein the threshold for the first number is based on a performance measurement of the seed blocks as a hash-based performance measurement threshold).

Regarding claim 18, Ghosh in view of Singhai and Dor as applied above toc claim 2 further teaches:
The non-transitory computer readable medium of claim 17, wherein the performance measurement is made for determination of a need to expand the set of seed blocks, the 27 measurement being based on a similarity of a sample of newly received blocks to a current set of seed blocks. (See Singhai [0009], [0084], [0090], [0113]-[0115], [0123]-[0124], and [0132]-[0133] wherein performance evaluation module evaluates the performance of the seed/reference set based on criteria (i.e. performance measurement) including deduplication ratio and determined to proceed/expand reference/seed set based on deduplication ratio comparing input blocks sample to seed blocks to set a predetermined threshold as in [0009]. See also Dor [0007], [0025], [0110], [0113], and [0114] generally wherein the threshold for the first number is based on a performance measurement of the seed blocks as a hash-based performance measurement threshold).

Claims 6, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Ghosh in view of Singhai and further in view of Battaje et al., U.S. Patent Pub. No. 2018/0349053 (hereinafter Battaje).

Regarding claim 6, Ghosh in view of Singhai as applied above to claim 5 further teaches:
The method of claim 5, 
Ghosh in view Singhai does not explicitly teach numbering operating phases as:
wherein the monitor of the de-duplication engine assigns an increasing sequence number to a current operating phase. (But not Ghosh as in Fig. 1 and col. 5:20-32 wherein each dataset is numbered sequentially and processed in phases sequentially according to such numbering).
However, Battaje in the analogous art of deduplication monitoring and management teaches:
wherein the monitor of the de-duplication engine assigns an increasing sequence number to a current operating phase. (See Battaje[0137]-[0140] and particularly [0139] wherein each reference set as a current/active set operating phase is assigned a sequence number, such that an order in time of operating phases and seed/reference sets is determinable).


Regarding claim 13, Ghosh in view of Singhai as applied above to claim 12 further teaches:
The system of claim 12, 
Ghosh in view Singhai does not explicitly teach numbering operating phases as:
wherein the monitor of the de-duplication engine assigns an increasing sequence number to a current operating phase. (But not Ghosh as in Fig. 1 and col. 5:20-32 wherein each dataset is numbered sequentially and processed in phases sequentially according to such numbering).
However, Battaje in the analogous art of deduplication monitoring and management teaches:
wherein the monitor of the de-duplication engine assigns an increasing sequence number to a current operating phase. (See Battaje[0137]-[0140] and particularly [0139] wherein each reference set as a current/active set operating phase is assigned a sequence number, such that an order in time of operating phases and seed/reference sets is determinable).
It would have been obvious to one of ordinary skill in the art to combine the teachings of Battaje with the teachings of Singhai and Ghosh. One having ordinary skill in the art would have been motivated 


Regarding claim 20, Ghosh in view of Singhai as applied above to claim 19 further teaches:
The non-transitory computer readable medium of claim 19, 
Ghosh in view Singhai does not explicitly teach numbering operating phases as:
wherein the monitor of the de-duplication engine assigns an increasing sequence number to a current operating phase. (But not Ghosh as in Fig. 1 and col. 5:20-32 wherein each dataset is numbered sequentially and processed in phases sequentially according to such numbering).
However, Battaje in the analogous art of deduplication monitoring and management teaches:
wherein the monitor of the de-duplication engine assigns an increasing sequence number to a current operating phase. (See Battaje[0137]-[0140] and particularly [0139] wherein each reference set as a current/active set operating phase is assigned a sequence number, such that an order in time of operating phases and seed/reference sets is determinable).
It would have been obvious to one of ordinary skill in the art to combine the teachings of Battaje with the teachings of Singhai and Ghosh. One having ordinary skill in the art would have been motivated to combine the sequence number of operating phases of reference/seed sets as in Battaje with the 

Conclusion
Examiner has cited particular columns, line numbers, references, or figures in the references applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings of the art and are applied to specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses to fully consider the references in entirety, as potentially teaching all or part of the claimed invention. See MPEP §§ 2141.02 and 2123.

The prior art made of record:
US 9,946,724
US2018/0025046
US2016/0054930
US2018/0349053

The pertinent prior art made of record but not relied upon for the rejections:
US 2011/0099351 (See Abstract and [0017] determine similar blocks to select node to store incoming block for processing and deduplication).
US2015/0019501 (See Abstract and [0037]-[0038] finding similar data position and reference digest for deduplication).
US2016/0019232 (See Abstract and [0030] group similar matching blocks in same extent. See also [0043]-[0048]).
US2014/0250077 (See Abstract and [0041] deduplication seeding including cycling ongoing basis to create common blocks).
US2018/0253255 (See Abstract and Fig. 2 various deduplication phases including merging phase coalescing data blocks and storing phase).
US 10,706,082 (See Abstract hash database for deduplication management including col. 3:45-67 periodic trigger for operations cycle and aging and cleaning various reference data).
US2012/0233135 (See Abstract sampling data for deduplication by hashing and selection of portion of hashes for similarity matching).
US 9,430,164 (See Abstract fingerprint for each file stored with list of chunks fingerprints and deduplicating).
US2013/0179407 (See Abstract seeding a deduplication processing repository/index).
US 10,795,860 (See abstract and Fig. 2A similarity group mapping for sending chunked hashed data to group for processing).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID T BROOKS whose telephone number is (571)272-3334.  The examiner can normally be reached on Monday - Friday 5:30AM to 2:00PM Eastern Time. Examiner email address is DAVID.BROOKS@USPTO.GOV
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. Applicant may also email examiner at DAVID.BROOKS@USPTO.GOV for scheduling purposes.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tamara Kyle can be reached on 571-272-4241.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/David T. Brooks/Primary Examiner, Art Unit 2156                                                                                                                                                                                                        3/11/2021