DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 


Continued Examination under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 06/17/2020 has been entered.


Response to Remarks
Applicant's amendments and remarks filed on 11/02/2021 have been fully considered but were not found to be persuasive. Applicant has amended Claims 1, 6-8, 11, 12, and 17-19. Claims 9, 10, 20, 21 were previously canceled and Claims 5 and 16 are newly canceled. Accordingly Claims 1-4, 6-8, 11-15, 17-19 are currently pending.

With respect to the Applicant’s argument/remark in page 7 recited: “Singhai does not teach: “ selecting according to a selection criterion, for at least one set of the sets of similar blocks, a reference block from among similar blocks of the at least one set ” - as recited in claim 1. Claim 1 of the present disclosure further recites that the similar block that ”

In response to the amended claim limitation above, Examiner relies on a new part of references which goes beyond the scope of the portion that was previously relied upon, therefore, this office action is based a new ground of rejection. As a consequence, Applicant is advised to review detailed mapping of claim limitations to the relevant sections of 35 USC§ 103 claim rejection.

In regard to the Applicant argument on page 8 recited: “Singhai discloses in paragraph [0123] that data blocks that are found to be similar to a reference data set stored in the storage - "can be encoded relative to the existing reference data set". Singhai does not mention that the encoding is relative to a data block selected from the reference data set”

Examiner found teachings of Singhai that he disclosed a step for determining a reference data set for encoding data blocks based on (i.e., “the encoding is relative to a data block”) a similarity between information associated with reference data set (i.e., selected from the reference data set) and that of the data block. 
(Singhai [0103] The encoding engine 310 in cooperation with one or more other components of the computing device 200 can determine a reference data set for encoding data blocks based on a similarity between information associated with identifiers of the reference data set and that of the data blocks.

Moreover, with respect to the Applicant’s argument in page 8 lines 12-13 recited: “Wallace teaches a group of similar chunks that are all compressed as a group, and does not teach a selected reference block of the group to be used for the compressing.”

Examiner respectfully disagrees with Applicant’s argument based on the following disclosure of Wallace wherein Wallace mentioned that there are multiple options to improve data compression when selecting data chunks to place together (i.e., “a selected reference block of the group to be used for the compression”).  
(Wallace col. 17, lines 31-36: “When selecting which data chunks to place together for the purpose of improving data compression, there are multiple options. One of the primary goals is to move similar chunks from any location within the storage together. The advantage is higher compression, but it may require a large amount of data movement, which consumes I/O resources. An alternative approach is to only reorganize chunks that fall within a specified storage unit such as a container.”) 

Furthermore, Examiner also has re-mapped the existing claim elements to relevant portions of references in order to enhance responses to the each of Applicant’s arguments. Accordingly, Applicant is advised to review detailed mapping of claim limitations to the relevant sections. 


Claim Rejections - 35 USC§ 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-8, 11-19 are rejected under 35 U.S.C. 103 as being unpatentable over Singhai et al. Pub. No.: US 2017/0123677 A1, hereinafter Singhai in view of Wallace et al., US 9,514,146 hereinafter, Wallance.

As per claim 1, (Currently Amended) A method for global data compression, comprising: (Singhai disclosed determining a reference data set (i.e., “clustering”) based on a similarity between information associated with identifiers of the reference data set and that of the data blocks in paragraph)  clustering similar blocks into sets of similar blocks; 
(Singhai [0103] The encoding engine 310 in cooperation with one or more other components of the computing device 200 can determine a reference data set for encoding data blocks based on a similarity between information associated with identifiers of the reference data set and that of the data blocks.

(Singhai disclosed identifying similar block information (i.e., “according to a selection criterion”) including content of data blocks/reference dataset, content version, calendar dates associated with modification to the content, data size and etc.) selecting, according to a selection criterion, for at least one set of the sets of similar blocks, a reference block from among similar blocks of the at least one set; 
(Singhai [0103] The encoding engine 310 in cooperation with one or more other components of the computing device 200 can determine a reference data set for encoding data blocks based on a similarity between information associated with identifiers of the reference data set and that of the data blocks. The identifier information may include information such as, content of the data blocks/reference data set, content version (e.g. revisions), calendar dates associated with modifications to the content, data size, etc.)

(Singhai disclosed a method of using a similarity-based algorithm to detect similarity between resemblance hashes of data blocks) splitting a dataset into a plurality of blocks; for each block of the plurality of blocks: computing at least one similarity hash for the block; determining, based on the at least one similarity hash, whether a similar block is found for the block, wherein the similar block has a similarity hash that is similar to one of the computed at least one similarity hash for the block; 
(Singhai [0099] “A similarity-based algorithm can be used to detect similarity between resemblance hashes of data blocks of an incoming data stream and resemblance hashes associated with reference data sets. In further embodiments, the resemblance hash may reflect a sketch of content associated with data block(s) and/or a reference data set.”)
(Singhai [0157] The encoding engine 308 may use the reference data set if the threshold is satisfied to encode the incoming data set (i.e. compress dedupe) such that duplicate copies are not stored but, rather a compressed version is stored). In some embodiments, the set of data blocks includes segments/chunks of data blocks in which the segments/chunks of data blocks may be encoded exclusively with a reference data set.)

(Singhai discloses the use of delta-encoding algorithms to identify the only changed portion of data set in the similar data blocks and encodes (e.g., compress) a delta)
replacing data of the block with a reference to the similar block and a delta, wherein the delta is a difference in data between the block and the similar block; and compressing the block independently when a similar block is not found.  
(Singhai [0135], lines 17-21: “In another embodiment, if the new set of data blocks are similar to an existing reference data set, the encoding engine 310 may store a delta showing the difference between the reference data set from which the new set of data blocks are encoded.”  
Singhai [0196], lines 1-4: The encoding engine 310 can be performed by a delta-encoding algorithm. Delta encoding algorithms identify similar resemblance hashes between data blocks and a reference data set and stores only the changed data.”)

(With respect to claim 1, Singhai does not explicitly discloses a method of appending the data block with the similar block to provide a combined block and compressing the combined block:) wherein the similar block is a certain reference block selected for a certain set of the at least one set; when the similar block is found: appending data of the block to data of the similar block to provide a combined block and compressing the combined block by applying a compression algorithm on the combined block; 

However, Wallance discloses a method of merging or adding similar chunks with previously compressed group and recompressing the nearly merged block (e.g., “appending data of the block to data of the similar block to provide a combined block and compressing”): 
(Wallance, col.20 line 64 - col.21 line 5: “If similar chunks exist, processing logic reads a compressed group of data chunks into memory via transaction 1722. The new chunk is added or merged with the previously compressed group. The merged group is recompressed and written out to the storage system via transaction 1723. Over time the compressed regions will be more and more packed with a bunch of similar items, i.e. a bunch of items all with sketch "I" in the same compression region.”)
Thus, one having ordinary skill in the art before the effective filing date of the claimed invention would have incorporated the teaching of Wallance in Singhai for the advantageous purpose of concatenating with existing similar data chunks based on their respective sketches, which save storage space by sharing the similar copies and also makes easier to search of data chunk because each data chunks are stored in the groups of data chunks, such that similar chunks are stored together with a single reference.

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine teachings of Wallance into the system of Singhai because, they are analogous art as being directed to the same field of endeavor, the system and method of improved data management/compression of a storage system. (See Singhai par. [0005], Wallance col.1, lines 20-25)



Claims 2, 4, 13, 15 are rejected under 35 U.S.C. 103 as being unpatentable over Singhai in view of Wallance and further in view of Raizen et al., Patent No.: US 8156306 B1, hereinafter Raizen.

As per claim 2.  (Original) The method of claim 1, further comprising: storing each compressed block, (Singhai does not explicitly disclose a method of metadata for storing the compressed block information.) wherein each independently compressed block is stored with metadata, wherein the metadata includes a compression algorithm used to compress the data.  
However, Raizen discloses compressed metadata map, which includes pointer to the beginning of the compressed chunk, the compression algorithm used and the size of the compressed data: (Raizen, col. 25, lines 49-53:  “For example, in one embodiment, the compressed metadata map 82B includes a pointer to the beginning of the compressed chunk, the algorithm used to compress it, and the size of the compressed data (so that the compressed chunk can be decompressed when needed).”)
Thus, one having ordinary skill in the art would have motivated to use teachings of Raizen, the method for storing property information of compressed chunk in the metadata in the Singhai’s Data Reduction Unit 210 (See FIG. 3B) to improve maintaining data blocks by storing the information about property of data block in the metadata, as an example, <meta compression = ‘tar’>, <meta size =‘57kb’> <meta hash = ‘4b57’> and etc.

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Raizen into the combined system of Singhai because, they are analogous art as being directed to the same field of endeavor, the systems and methods to use deduplication, compression and data reduction techniques. 

As per claim 3.  (Previously Presented) The method of claim 1, further comprising: Singhai teaches a method of identifying a set of similar blocks, sets a reference and using a reference count associate with reference data blocks by tracking the number of times data blocks relying on a reference data block: storing, for a stored block, a reference count that includes a number of similar blocks for the stored block determining, based on the reference count for the stored block, whether to delete at least one reference to the stored block, wherein it is determined to delete the at least one reference when the stored block is not being used; and deleting the at least one reference to one of the stored blocks when it is determined to delete the at least one reference.  
(Singhai par. [0131] The method 600 may then continue by identifying 606 whether similarity exists between the new set of data blocks and at least one or more reference data sets. In some embodiments, the matching engine 308 in cooperation with the signature fingerprint computation engine 306 may identify whether a similarity exists between the new set of data blocks and one or more reference data sets stored in a non-transitory data store based on the analysis.
Singhai par. [0039], lines 1-8: “A method for retiring old reference data blocks that are no longer useful needs to be applied. The method may include a reference count associated with reference data blocks by tracking the number of times data blocks rely on a reference data block and/or set of reference data blocks such that it can be determined when a reference data block is no longer relied upon by a data block and can therefrom be retired from the set.”)

As per claim 4. (Original) The method of claim 1, wherein the dataset is split using variable-sized chunking.  
Singhai teaches does not explicitly discloses using variable-sized data chunk.  
However, Raizen discloses the implementation of thin provisional layer providing variable length chuck sizes: 
(Raizen, col. 15, lines 32-41: “Variable length chunk sizes can, in some instances, result in more complicated mappings and remap pings than with fixed chunks; however, variable length chunk sizes are more readily implemented if the space reclamation related layers 28, 31, 32, and/or optionally 35, and the thin provisioning layer 54, are all provided/implemented entirely within a storage appliance 14, as is illustrated in FIG. 1C. In at least some embodiments, furthermore, the thin provisioning layer 54 can even provide data storage extents of a variable size.”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Raizen into the combined system of Singhai to improve utilization of storage space where storage spaces are limited resources. As an example, since variable length of chuck are more flexible to be fit in available spaces over fixed size chunk, the variable sized chuck may have an advantage of minimizing unused storage spaces in the system.
As per claim 5. (Cancelled)  

As per claim 6. (Currently Amended) The method of claim 1, further comprising: storing, in an index, the similarity hash computed for each of the plurality of blocks, (Singhai discloses that resemblance hashes of reference data set and/or segments of reference data (e.g., index) sets may be stored in data store) wherein whether a similar block is found is determined based on the indexed similarity hashes.  
 (Singhai par. [0131] lines 8 - 13: “For instance, the matching engine 308 may compare resemblance hashes of one or more reference data sets and/or segments of reference data sets stored in a data store such as, data storage repository 110, to resemblance hashes associated with the new set of data blocks.”)
As per claim 7. (Currently Amended) The method of claim 1, wherein (Singhai teaches the steps for (1) receiving data blocks, (2) identifying whether similarity already exists (i.e., “selecting the certain reference block includes selecting a block that was received before”), (3) determine similarity and finally, (4) encode (compress) blocks by associating with the reference data block) the selection criterion for selecting the certain reference block includes selecting a block that was received before each other block of the certain set.
(See Singhai Fig. 6A, element 602: receives data block, element 606: identify whether similarity exists, element 608: determine similarity, element 610: encode data block, (e.g., compress), element 612: update the records table)”)

As per claim 8. (Currently Amended) The method of claim [[5]] 1, wherein the selection criterion for selecting the certain reference block includes (Singhai discloses a method of generating a new reference data set based on satisfying a data size being within a assigned predefined range)
selecting a block that has a largest size among blocks of the certain set.  
 (Singhai [0142] “In one embodiment, the matching engine 308 transmits the set to the encoding engine 310, and the encoding engine 310 then generates a new reference data set that may include one or more data blocks that satisfy a criterion. For instance, the new reference data set can be generated based on one or more data blocks satisfying a data size being within an assigned predefined range. In one embodiment, the encoding engine 310 generates the new reference data set based on the one or more data blocks sharing content that is within a degree of similarity between each of the one or more data blocks.”)
As per claim 9. (Cancelled)  

As per claim 10. (Cancelled)  

As per claim 11 (Currently Amended) A non-transitory computer readable medium having stored thereon instructions for causing a processing circuitry to execute a process, the process comprising: clustering similar blocks into sets of similar blocks; selecting according to a selection criterion, for at least one set of the sets of similar blocks, a reference block from among similar blocks of the at least one set; splitting a dataset into a plurality of blocks; for each block of the plurality of blocks: computing at least one similarity hash for the block; determining, based on the at least one similarity hash, whether a similar block is found for the block, wherein [[a]] the similar block 

Claims 11 is analogous to claim 1 and is rejected under the same rationale as indicated above.

As per claim 12, (Currently Amended) A system for global data compression, comprising: a processing circuitry; and a memory, the memory containing instructions that, when executed by the processing circuitry, configure the system to: cluster similar blocks into sets of similar blocks;  select according to a selection criterion, for at least one set of the sets of similar blocks, a reference block from among similar blocks of the at least one set; split a 

Claims 12 is analogous to claim 1 and is rejected under the same rationale as indicated above.

As per claim 13. (Previously Presented) The system of claim 12, wherein the memory further contains instructions that, when executed by the processing circuitry, configure the system to: store each compressed block, wherein each independently compressed block is stored with metadata, wherein the metadata includes a compression algorithm used to compress the data.  

Claims 13 is analogous to claim 2 and is rejected under the same rationale as indicated above.
 
As per claim 14. (Currently Amended) The system of claim 12, wherein the memory further contains instructions that, when executed by the processing circuitry, configure the system to:   

Claims 14 is analogous to claim 3 and is rejected under the same rationale as indicated above.
As per claim 15, (Previously Presented) The system of claim 12, wherein the dataset is split using variable-sized chunking.  

Claims 15 is analogous to claim 4 and is rejected under the same rationale as indicated above.
 
As per claim 16. (Cancelled)  

As per claim 17.  (Currently Amended) The system of claim [[16]] 12, wherein the memory further contains instructions that, when executed by the processing circuitry, configure the system to: store, in an index, the similarity hash computed for each of the plurality of blocks, wherein whether a similar block is found is determined based on the indexed similarity hashes.

Claims 17 is analogous to claim 6 and is rejected under the same rationale as indicated above.

As per claim 18.  (Currently Amended) The system of claim [[16]] 12, wherein



Claims 18 is analogous to claim 7 and is rejected under the same rationale as indicated above.
 
As per claim 19.  (Currently Amended) The system of claim [[16]] 12, wherein the selection criterion for selecting the certain 

Claims 19 is analogous to claim 8 and is rejected under the same rationale as indicated above.

As per claim 20.   (Cancelled)  
As per claim 21.   (Cancelled)  


Pertinent Prior Art

The following are prior art references made of record but not currently relied upon:

DEDUPLICATION OF FILE (Zhu, US 2015/0347445) – A system and method of deduplication of a file, a computer program product, and an apparatus thereof.


Conclusion 

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHONGSUH PARK whose telephone number is (408)918-7574.  The examiner can normally be reached on Monday - Friday 8:00-5:30 PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hosain Alam can be reached on (571)272-3978 EST.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/CHONGSUH PARK/Examiner, Art Unit 2154   
                                                                                                                                                                                                     

/HOSAIN T ALAM/Supervisory Patent Examiner, Art Unit 2154